[jira] [Resolved] (HBASE-28016) hbck2 should support change region state of meta

2023-09-18 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang resolved HBASE-28016.

Resolution: Won't Fix

> hbck2 should support change region state of meta
> 
>
> Key: HBASE-28016
> URL: https://issues.apache.org/jira/browse/HBASE-28016
> Project: HBase
>  Issue Type: Improvement
>  Components: hbase-operator-tools, hbck2
>Affects Versions: hbase-operator-tools-1.2.0
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> The region state of meta is stored in zk, if the state is wrong, we need a 
> way to change it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28016) hbck2 should support change region state of meta

2023-09-18 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766354#comment-17766354
 ] 

Zheng Wang commented on HBASE-28016:


Changed another way to do it.

> hbck2 should support change region state of meta
> 
>
> Key: HBASE-28016
> URL: https://issues.apache.org/jira/browse/HBASE-28016
> Project: HBase
>  Issue Type: Improvement
>  Components: hbase-operator-tools, hbck2
>Affects Versions: hbase-operator-tools-1.2.0
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> The region state of meta is stored in zk, if the state is wrong, we need a 
> way to change it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-26987) The length of compact queue grows too big when the compacting is slow

2023-08-16 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755311#comment-17755311
 ] 

Zheng Wang commented on HBASE-26987:


I updated this PR, could you help to review it?  

Thanks.  [~zhangduo] 

> The length of compact queue grows too big when the compacting is slow
> -
>
> Key: HBASE-26987
> URL: https://issues.apache.org/jira/browse/HBASE-26987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: image-2022-04-29-10-26-09-351.png, 
> image-2022-04-29-10-26-18-323.png, image-2022-04-29-10-26-24-087.png
>
>
> For some system compaction, we set the selectNow to false, so the file 
> selecting will not be done until the compaction running, it brings side 
> effect, if another compacting is slow, we may put lots of compaction to 
> queue, because the filesCompacting of Hstore is empty in the meantime.
> An example shows at attachments, there are 154 regions and about 2000 hfiles, 
> but the length of compact queue grows to 1391, it cause confusion and may 
> trigger unexpected alarm.
> My approach is limit the compaction queue count, by compute the 
> filesNotCompating and hbase.hstore.compaction.max.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28016) hbck2 should support change region state of meta

2023-08-10 Thread Zheng Wang (Jira)
Zheng Wang created HBASE-28016:
--

 Summary: hbck2 should support change region state of meta
 Key: HBASE-28016
 URL: https://issues.apache.org/jira/browse/HBASE-28016
 Project: HBase
  Issue Type: Improvement
  Components: hbase-operator-tools, hbck2
Affects Versions: hbase-operator-tools-1.2.0
Reporter: Zheng Wang
Assignee: Zheng Wang


The region state of meta is stored in zk, if the state is wrong, we need a way 
to change it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-27805) The chunk created by mslab may cause memory fragement and lead to fullgc

2023-07-26 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747710#comment-17747710
 ] 

Zheng Wang edited comment on HBASE-27805 at 7/27/23 2:07 AM:
-

In this issue, we just updated the doc and provided a way to workaround.


was (Author: filtertip):
In this issue, we just updated the documentation and provided a way to 
workaround.

> The chunk created by mslab may cause memory fragement and lead to fullgc
> 
>
> Key: HBASE-27805
> URL: https://issues.apache.org/jira/browse/HBASE-27805
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: chunksize-2047k.png, chunksize-2048k-fullgc.png
>
>
> The default size of chunk is 2m, when we use G1, if heapRegionSize equals 4m, 
> these chunks are allocated as humongous objects, exclusively allocating one 
> region, then the remaining 2m become memory fragement.
> Lots of memory fragement may lead to fullgc even if the percent of used heap 
> not high enough.
> I have tested to reduce the chunk size to 2047k(2m-1k, a bit lesser than half 
> of heapRegionSize), there was no repeat of the above.
> BTW, in G1, humongous objects are objects larger or equal the size of half a 
> region, and the heapRegionSize is automatically calculated based on the heap 
> size parameter if not explicitly specified.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27805) The chunk created by mslab may cause memory fragement and lead to fullgc

2023-07-26 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747710#comment-17747710
 ] 

Zheng Wang commented on HBASE-27805:


In this issue, we just updated the documentation and provided a way to 
workaround.

> The chunk created by mslab may cause memory fragement and lead to fullgc
> 
>
> Key: HBASE-27805
> URL: https://issues.apache.org/jira/browse/HBASE-27805
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: chunksize-2047k.png, chunksize-2048k-fullgc.png
>
>
> The default size of chunk is 2m, when we use G1, if heapRegionSize equals 4m, 
> these chunks are allocated as humongous objects, exclusively allocating one 
> region, then the remaining 2m become memory fragement.
> Lots of memory fragement may lead to fullgc even if the percent of used heap 
> not high enough.
> I have tested to reduce the chunk size to 2047k(2m-1k, a bit lesser than half 
> of heapRegionSize), there was no repeat of the above.
> BTW, in G1, humongous objects are objects larger or equal the size of half a 
> region, and the heapRegionSize is automatically calculated based on the heap 
> size parameter if not explicitly specified.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27805) The chunk created by mslab may cause memory fragement and lead to fullgc

2023-07-26 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang resolved HBASE-27805.

Resolution: Fixed

> The chunk created by mslab may cause memory fragement and lead to fullgc
> 
>
> Key: HBASE-27805
> URL: https://issues.apache.org/jira/browse/HBASE-27805
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: chunksize-2047k.png, chunksize-2048k-fullgc.png
>
>
> The default size of chunk is 2m, when we use G1, if heapRegionSize equals 4m, 
> these chunks are allocated as humongous objects, exclusively allocating one 
> region, then the remaining 2m become memory fragement.
> Lots of memory fragement may lead to fullgc even if the percent of used heap 
> not high enough.
> I have tested to reduce the chunk size to 2047k(2m-1k, a bit lesser than half 
> of heapRegionSize), there was no repeat of the above.
> BTW, in G1, humongous objects are objects larger or equal the size of half a 
> region, and the heapRegionSize is automatically calculated based on the heap 
> size parameter if not explicitly specified.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27805) The chunk created by mslab may cause memory fragement and lead to fullgc

2023-07-26 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-27805:
---
Component/s: documentation
 (was: regionserver)

> The chunk created by mslab may cause memory fragement and lead to fullgc
> 
>
> Key: HBASE-27805
> URL: https://issues.apache.org/jira/browse/HBASE-27805
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: chunksize-2047k.png, chunksize-2048k-fullgc.png
>
>
> The default size of chunk is 2m, when we use G1, if heapRegionSize equals 4m, 
> these chunks are allocated as humongous objects, exclusively allocating one 
> region, then the remaining 2m become memory fragement.
> Lots of memory fragement may lead to fullgc even if the percent of used heap 
> not high enough.
> I have tested to reduce the chunk size to 2047k(2m-1k, a bit lesser than half 
> of heapRegionSize), there was no repeat of the above.
> BTW, in G1, humongous objects are objects larger or equal the size of half a 
> region, and the heapRegionSize is automatically calculated based on the heap 
> size parameter if not explicitly specified.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-26987) The length of compact queue grows too big when the compacting is slow

2023-07-17 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743788#comment-17743788
 ] 

Zheng Wang commented on HBASE-26987:


Yeah, that's what the problem is.

One thing to consider about the solution is that if the number of files to be 
compacted is too large and there is only one task in the queue, it will not 
reflect the actual situation. [~zhangduo] 

> The length of compact queue grows too big when the compacting is slow
> -
>
> Key: HBASE-26987
> URL: https://issues.apache.org/jira/browse/HBASE-26987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: image-2022-04-29-10-26-09-351.png, 
> image-2022-04-29-10-26-18-323.png, image-2022-04-29-10-26-24-087.png
>
>
> For some system compaction, we set the selectNow to false, so the file 
> selecting will not be done until the compaction running, it brings side 
> effect, if another compacting is slow, we may put lots of compaction to 
> queue, because the filesCompacting of Hstore is empty in the meantime.
> An example shows at attachments, there are 154 regions and about 2000 hfiles, 
> but the length of compact queue grows to 1391, it cause confusion and may 
> trigger unexpected alarm.
> My approach is limit the compaction queue count, by compute the 
> filesNotCompating and hbase.hstore.compaction.max.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27964) Adds a switch for compaction's delay selection feature

2023-07-06 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740578#comment-17740578
 ] 

Zheng Wang commented on HBASE-27964:


Here is a test.

I write data to a cluster which has two regionServers, and limit the compaction 
throughput, the node9(10.0.0.9) apply the patch and disable delayed selection, 
we can see the store_file_count and store_file_size grows similar, but the 
compaction_queue_length has big difference, the node23(10.0.0.23) obviously 
incorrect.

!image-2023-07-06-19-51-21-354.png|width=503,height=192!

!image-2023-07-06-20-00-21-587.png|width=502,height=190!

!image-2023-07-06-20-00-41-933.png|width=501,height=192!

> Adds a switch for compaction's delay selection feature
> --
>
> Key: HBASE-27964
> URL: https://issues.apache.org/jira/browse/HBASE-27964
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: image-2023-07-06-19-51-21-354.png, 
> image-2023-07-06-20-00-21-587.png, image-2023-07-06-20-00-41-933.png
>
>
> When the compact pressure is high, delayed selection can cause the metric of 
> compact queue length to continuously increase incorrectly. We should provide 
> an option to disable this feature if the user values metric accuracy more.
> See HBASE-26987 for more detail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27964) Adds a switch for compaction's delay selection feature

2023-07-06 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-27964:
---
Attachment: image-2023-07-06-20-00-21-587.png

> Adds a switch for compaction's delay selection feature
> --
>
> Key: HBASE-27964
> URL: https://issues.apache.org/jira/browse/HBASE-27964
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: image-2023-07-06-19-51-21-354.png, 
> image-2023-07-06-20-00-21-587.png, image-2023-07-06-20-00-41-933.png
>
>
> When the compact pressure is high, delayed selection can cause the metric of 
> compact queue length to continuously increase incorrectly. We should provide 
> an option to disable this feature if the user values metric accuracy more.
> See HBASE-26987 for more detail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27964) Adds a switch for compaction's delay selection feature

2023-07-06 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-27964:
---
Attachment: image-2023-07-06-20-00-41-933.png

> Adds a switch for compaction's delay selection feature
> --
>
> Key: HBASE-27964
> URL: https://issues.apache.org/jira/browse/HBASE-27964
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: image-2023-07-06-19-51-21-354.png, 
> image-2023-07-06-20-00-21-587.png, image-2023-07-06-20-00-41-933.png
>
>
> When the compact pressure is high, delayed selection can cause the metric of 
> compact queue length to continuously increase incorrectly. We should provide 
> an option to disable this feature if the user values metric accuracy more.
> See HBASE-26987 for more detail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27964) Adds a switch for compaction's delay selection feature

2023-07-06 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-27964:
---
Attachment: image-2023-07-06-19-51-21-354.png

> Adds a switch for compaction's delay selection feature
> --
>
> Key: HBASE-27964
> URL: https://issues.apache.org/jira/browse/HBASE-27964
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: image-2023-07-06-19-51-21-354.png
>
>
> When the compact pressure is high, delayed selection can cause the metric of 
> compact queue length to continuously increase incorrectly. We should provide 
> an option to disable this feature if the user values metric accuracy more.
> See HBASE-26987 for more detail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-27964) Adds a switch for compaction's delay selection feature

2023-07-06 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740515#comment-17740515
 ] 

Zheng Wang edited comment on HBASE-27964 at 7/6/23 9:27 AM:


Yeah, I think so.  Consider this patch not fix it, just add a switch, so i set 
the type as improvement.

[~zhangduo] 


was (Author: filtertip):
Yeah, I think so.  [~zhangduo] 

> Adds a switch for compaction's delay selection feature
> --
>
> Key: HBASE-27964
> URL: https://issues.apache.org/jira/browse/HBASE-27964
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> When the compact pressure is high, delayed selection can cause the metric of 
> compact queue length to continuously increase incorrectly. We should provide 
> an option to disable this feature if the user values metric accuracy more.
> See HBASE-26987 for more detail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27964) Adds a switch for compaction's delay selection feature

2023-07-06 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740515#comment-17740515
 ] 

Zheng Wang commented on HBASE-27964:


Yeah, I think so.  [~zhangduo] 

> Adds a switch for compaction's delay selection feature
> --
>
> Key: HBASE-27964
> URL: https://issues.apache.org/jira/browse/HBASE-27964
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> When the compact pressure is high, delayed selection can cause the metric of 
> compact queue length to continuously increase incorrectly. We should provide 
> an option to disable this feature if the user values metric accuracy more.
> See HBASE-26987 for more detail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27964) Adds a switch for compaction's delay selection feature

2023-07-06 Thread Zheng Wang (Jira)
Zheng Wang created HBASE-27964:
--

 Summary: Adds a switch for compaction's delay selection feature
 Key: HBASE-27964
 URL: https://issues.apache.org/jira/browse/HBASE-27964
 Project: HBase
  Issue Type: Improvement
  Components: Compaction
Reporter: Zheng Wang
Assignee: Zheng Wang


When the compact pressure is high, delayed selection can cause the metric of 
compact queue length to continuously increase incorrectly. We should provide an 
option to disable this feature if the user values metric accuracy more.

See HBASE-26987 for more detail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27788) Skip family comparing when compare cells inner the store

2023-05-09 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721157#comment-17721157
 ] 

Zheng Wang commented on HBASE-27788:


Pushed to master and branch-2.

> Skip family comparing when compare cells inner the store
> 
>
> Key: HBASE-27788
> URL: https://issues.apache.org/jira/browse/HBASE-27788
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
> Attachments: BenchmarkForInnerStore.java, BenchmarkForNormal.java
>
>
> Currently we use CellComparatorImpl to compare cells, it compare row first, 
> then family, then qulifier and so on.
> If the comparing inner the store, the families are always equal(unless the 
> familyLength is zero for special purpose), so this step could be skipped for 
> better performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27788) Skip family comparing when compare cells inner the store

2023-05-09 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang resolved HBASE-27788.

Resolution: Fixed

> Skip family comparing when compare cells inner the store
> 
>
> Key: HBASE-27788
> URL: https://issues.apache.org/jira/browse/HBASE-27788
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
> Attachments: BenchmarkForInnerStore.java, BenchmarkForNormal.java
>
>
> Currently we use CellComparatorImpl to compare cells, it compare row first, 
> then family, then qulifier and so on.
> If the comparing inner the store, the families are always equal(unless the 
> familyLength is zero for special purpose), so this step could be skipped for 
> better performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27788) Skip family comparing when compare cells inner the store

2023-05-09 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-27788:
---
Fix Version/s: 2.6.0

> Skip family comparing when compare cells inner the store
> 
>
> Key: HBASE-27788
> URL: https://issues.apache.org/jira/browse/HBASE-27788
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
> Attachments: BenchmarkForInnerStore.java, BenchmarkForNormal.java
>
>
> Currently we use CellComparatorImpl to compare cells, it compare row first, 
> then family, then qulifier and so on.
> If the comparing inner the store, the families are always equal(unless the 
> familyLength is zero for special purpose), so this step could be skipped for 
> better performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HBASE-27788) Skip family comparing when compare cells inner the store

2023-05-09 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang reopened HBASE-27788:


> Skip family comparing when compare cells inner the store
> 
>
> Key: HBASE-27788
> URL: https://issues.apache.org/jira/browse/HBASE-27788
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 3.0.0-alpha-4
>
> Attachments: BenchmarkForInnerStore.java, BenchmarkForNormal.java
>
>
> Currently we use CellComparatorImpl to compare cells, it compare row first, 
> then family, then qulifier and so on.
> If the comparing inner the store, the families are always equal(unless the 
> familyLength is zero for special purpose), so this step could be skipped for 
> better performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27788) Skip family comparing when compare cells inner the store

2023-05-09 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721148#comment-17721148
 ] 

Zheng Wang commented on HBASE-27788:


Oh, forgot it, will do it later.

> Skip family comparing when compare cells inner the store
> 
>
> Key: HBASE-27788
> URL: https://issues.apache.org/jira/browse/HBASE-27788
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 3.0.0-alpha-4
>
> Attachments: BenchmarkForInnerStore.java, BenchmarkForNormal.java
>
>
> Currently we use CellComparatorImpl to compare cells, it compare row first, 
> then family, then qulifier and so on.
> If the comparing inner the store, the families are always equal(unless the 
> familyLength is zero for special purpose), so this step could be skipped for 
> better performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27788) Skip family comparing when compare cells inner the store

2023-05-09 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang resolved HBASE-27788.

Fix Version/s: 3.0.0-alpha-4
   Resolution: Fixed

> Skip family comparing when compare cells inner the store
> 
>
> Key: HBASE-27788
> URL: https://issues.apache.org/jira/browse/HBASE-27788
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 3.0.0-alpha-4
>
> Attachments: BenchmarkForInnerStore.java, BenchmarkForNormal.java
>
>
> Currently we use CellComparatorImpl to compare cells, it compare row first, 
> then family, then qulifier and so on.
> If the comparing inner the store, the families are always equal(unless the 
> familyLength is zero for special purpose), so this step could be skipped for 
> better performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27788) Skip family comparing when compare cells inner the store

2023-05-09 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721143#comment-17721143
 ] 

Zheng Wang commented on HBASE-27788:


Thanks for all the comments. [~zhangduo] [~bbeaudreault] 

> Skip family comparing when compare cells inner the store
> 
>
> Key: HBASE-27788
> URL: https://issues.apache.org/jira/browse/HBASE-27788
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: BenchmarkForInnerStore.java, BenchmarkForNormal.java
>
>
> Currently we use CellComparatorImpl to compare cells, it compare row first, 
> then family, then qulifier and so on.
> If the comparing inner the store, the families are always equal(unless the 
> familyLength is zero for special purpose), so this step could be skipped for 
> better performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27788) Skip family comparing when compare cells inner the store

2023-05-01 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-27788:
---
Attachment: BenchmarkForInnerStore.java
BenchmarkForNormal.java

> Skip family comparing when compare cells inner the store
> 
>
> Key: HBASE-27788
> URL: https://issues.apache.org/jira/browse/HBASE-27788
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: BenchmarkForInnerStore.java, BenchmarkForNormal.java
>
>
> Currently we use CellComparatorImpl to compare cells, it compare row first, 
> then family, then qulifier and so on.
> If the comparing inner the store, the families are always equal(unless the 
> familyLength is zero for special purpose), so this step could be skipped for 
> better performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-27788) Skip family comparing when compare cells inner the store

2023-04-22 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712226#comment-17712226
 ] 

Zheng Wang edited comment on HBASE-27788 at 4/23/23 4:01 AM:
-

Test Env:
linux5.4, jdk8, jmh1.36, 8c16g

Test Cmd:
java -jar benchmarks.jar -i 5 -r 10 -wi 5 -w 10 -o result.out

Test Mode:
throughput, the more the better

 
|Benchmark|(p1)|(p2)|Mode|Cnt|Score|Error|Units|Diff|
|BenchmarkForInnerStore.new_compareBBKV| | |thrpt|5|28025769|± 
85837.894|ops/s|1.00%|
|BenchmarkForInnerStore.new_compareBBKV| |fam1|thrpt|5|45988795|± 
743418.588|ops/s|14.00%|
|BenchmarkForInnerStore.new_compareBBKV|fam1| |thrpt|5|46169746|± 
313848.117|ops/s|15.00%|
|BenchmarkForInnerStore.new_compareBBKV|fam1|fam1|thrpt|5|28340570|± 
110743.597|ops/s|19.00%|
|BenchmarkForInnerStore.new_compareKV| | |thrpt|5|28555080|± 
137117.752|ops/s|1.00%|
|BenchmarkForInnerStore.new_compareKV| |fam1|thrpt|5|48428310|± 
457635.029|ops/s|12.00%|
|BenchmarkForInnerStore.new_compareKV|fam1| |thrpt|5|48493949|± 
251767.842|ops/s|12.00%|
|BenchmarkForInnerStore.new_compareKV|fam1|fam1|thrpt|5|28550667|± 
115741.387|ops/s|27.00%|
|BenchmarkForInnerStore.new_compareKVVsBBKV| | |thrpt|5|29217290|± 
101649.947|ops/s|6.00%|
|BenchmarkForInnerStore.new_compareKVVsBBKV| |fam1|thrpt|5|46949029|± 
215794.996|ops/s|7.00%|
|BenchmarkForInnerStore.new_compareKVVsBBKV|fam1| |thrpt|5|46946670|± 
146710.467|ops/s|7.00%|
|BenchmarkForInnerStore.new_compareKVVsBBKV|fam1|fam1|thrpt|5|29148782|± 
206963.662|ops/s|20.00%|
|BenchmarkForInnerStore.old_compareBBKV| | |thrpt|5|27675873|± 
276983.891|ops/s| |
|BenchmarkForInnerStore.old_compareBBKV| |fam1|thrpt|5|40225985|± 
333777.174|ops/s| |
|BenchmarkForInnerStore.old_compareBBKV|fam1| |thrpt|5|40187512|± 
242635.903|ops/s| |
|BenchmarkForInnerStore.old_compareBBKV|fam1|fam1|thrpt|5|23719010|± 
78500.923|ops/s| |
|BenchmarkForInnerStore.old_compareKV| | |thrpt|5|28263508|± 80403.361|ops/s| |
|BenchmarkForInnerStore.old_compareKV| |fam1|thrpt|5|43253529|± 
227223.861|ops/s| |
|BenchmarkForInnerStore.old_compareKV|fam1| |thrpt|5|43251637|± 
370669.972|ops/s| |
|BenchmarkForInnerStore.old_compareKV|fam1|fam1|thrpt|5|22556530|± 
131922.278|ops/s| |
|BenchmarkForInnerStore.old_compareKVVsBBKV| | |thrpt|5|27607601|± 
181466.155|ops/s| |
|BenchmarkForInnerStore.old_compareKVVsBBKV| |fam1|thrpt|5|43838946|± 
147828.804|ops/s| |
|BenchmarkForInnerStore.old_compareKVVsBBKV|fam1| |thrpt|5|43853799|± 
159898.926|ops/s| |
|BenchmarkForInnerStore.old_compareKVVsBBKV|fam1|fam1|thrpt|5|24349838|± 
233577.807|ops/s| |
|BenchmarkForNormal.new_compareBBKV|fam|fam1|thrpt|5|35397671|± 
87148.764|ops/s|-3.00%|
|BenchmarkForNormal.new_compareBBKV|fam1|fam|thrpt|5|34853758|± 
181728.193|ops/s|-4.00%|
|BenchmarkForNormal.new_compareKV|fam|fam1|thrpt|5|36379792|± 
103787.745|ops/s|0.00%|
|BenchmarkForNormal.new_compareKV|fam1|fam|thrpt|5|36389642|± 
215220.231|ops/s|0.00%|
|BenchmarkForNormal.new_compareKVVsBBKV|fam|fam1|thrpt|5|39477917|± 
116925.262|ops/s|0.00%|
|BenchmarkForNormal.new_compareKVVsBBKV|fam1|fam|thrpt|5|39381461|± 
196771.635|ops/s|0.00%|
|BenchmarkForNormal.old_compareBBKV|fam|fam1|thrpt|5|36419133|± 
159715.504|ops/s| |
|BenchmarkForNormal.old_compareBBKV|fam1|fam|thrpt|5|36422202|± 
86247.635|ops/s| |
|BenchmarkForNormal.old_compareKV|fam|fam1|thrpt|5|36247387|± 95109.893|ops/s| |
|BenchmarkForNormal.old_compareKV|fam1|fam|thrpt|5|36260304|± 63266.840|ops/s| |
|BenchmarkForNormal.old_compareKVVsBBKV|fam|fam1|thrpt|5|39326233|± 
218927.739|ops/s| |
|BenchmarkForNormal.old_compareKVVsBBKV|fam1|fam|thrpt|5|39297932|± 
487026.618|ops/s| |


was (Author: filtertip):
Test Env:
linux5.4, jdk8, jmh1.36, 8c16g

Test Cmd:
java -jar benchmarks.jar -i 5 -r 10 -wi 5 -w 10 -o result.out

Test Mode:
throughput, the more the better

 
|Benchmark|(p1)|(p2)|Mode|Cnt|Score|Error|Units|Diff|
|BenchmarkForInnerStore.new_compareBBKV| | |thrpt|5|28025769|± 
85837.894|ops/s|1.00%|
|BenchmarkForInnerStore.new_compareBBKV| |fam1|thrpt|5|45988795|± 
743418.588|ops/s|14.00%|
|BenchmarkForInnerStore.new_compareBBKV|fam1| |thrpt|5|46169746|± 
313848.117|ops/s|15.00%|
|BenchmarkForInnerStore.new_compareBBKV|fam1|fam1|thrpt|5|28340570|± 
110743.597|ops/s|19.00%|
|BenchmarkForInnerStore.new_compareKV| | |thrpt|5|28555080|± 
137117.752|ops/s|1.00%|
|BenchmarkForInnerStore.new_compareKV| |fam1|thrpt|5|48428310|± 
457635.029|ops/s|12.00%|
|BenchmarkForInnerStore.new_compareKV|fam1| |thrpt|5|48493949|± 
251767.842|ops/s|12.00%|
|BenchmarkForInnerStore.new_compareKV|fam1|fam1|thrpt|5|28550667|± 
115741.387|ops/s|27.00%|
|BenchmarkForInnerStore.new_compareKVVsBBKV| | |thrpt|5|29217290|± 
101649.947|ops/s|6.00%|
|BenchmarkForInnerStore.new_compareKVVsBBKV| |fam1|thrpt|5|46949029|± 
215794.996|ops/s|7.00%|
|BenchmarkForInnerStore.new_compareKVVsBBKV|fam1| |thrpt|5|46946670|± 
146710.467|ops/s|7.00%|

[jira] [Comment Edited] (HBASE-27788) Skip family comparing when compare cells inner the store

2023-04-21 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712226#comment-17712226
 ] 

Zheng Wang edited comment on HBASE-27788 at 4/22/23 3:34 AM:
-

Test Env:
linux5.4, jdk8, jmh1.36, 8c16g

Test Cmd:
java -jar benchmarks.jar -i 5 -r 10 -wi 5 -w 10 -o result.out

Test Mode:
throughput, the more the better

 
|Benchmark|(p1)|(p2)|Mode|Cnt|Score|Error|Units|Diff|
|BenchmarkForInnerStore.new_compareBBKV| | |thrpt|5|28025769|± 
85837.894|ops/s|1.00%|
|BenchmarkForInnerStore.new_compareBBKV| |fam1|thrpt|5|45988795|± 
743418.588|ops/s|14.00%|
|BenchmarkForInnerStore.new_compareBBKV|fam1| |thrpt|5|46169746|± 
313848.117|ops/s|15.00%|
|BenchmarkForInnerStore.new_compareBBKV|fam1|fam1|thrpt|5|28340570|± 
110743.597|ops/s|19.00%|
|BenchmarkForInnerStore.new_compareKV| | |thrpt|5|28555080|± 
137117.752|ops/s|1.00%|
|BenchmarkForInnerStore.new_compareKV| |fam1|thrpt|5|48428310|± 
457635.029|ops/s|12.00%|
|BenchmarkForInnerStore.new_compareKV|fam1| |thrpt|5|48493949|± 
251767.842|ops/s|12.00%|
|BenchmarkForInnerStore.new_compareKV|fam1|fam1|thrpt|5|28550667|± 
115741.387|ops/s|27.00%|
|BenchmarkForInnerStore.new_compareKVVsBBKV| | |thrpt|5|29217290|± 
101649.947|ops/s|6.00%|
|BenchmarkForInnerStore.new_compareKVVsBBKV| |fam1|thrpt|5|46949029|± 
215794.996|ops/s|7.00%|
|BenchmarkForInnerStore.new_compareKVVsBBKV|fam1| |thrpt|5|46946670|± 
146710.467|ops/s|7.00%|
|BenchmarkForInnerStore.new_compareKVVsBBKV|fam1|fam1|thrpt|5|29148782|± 
206963.662|ops/s|20.00%|
|BenchmarkForInnerStore.old_compareBBKV| | |thrpt|5|27675873|± 
276983.891|ops/s| |
|BenchmarkForInnerStore.old_compareBBKV| |fam1|thrpt|5|40225985|± 
333777.174|ops/s| |
|BenchmarkForInnerStore.old_compareBBKV|fam1| |thrpt|5|40187512|± 
242635.903|ops/s| |
|BenchmarkForInnerStore.old_compareBBKV|fam1|fam1|thrpt|5|23719010|± 
78500.923|ops/s| |
|BenchmarkForInnerStore.old_compareKV| | |thrpt|5|28263508|± 80403.361|ops/s| |
|BenchmarkForInnerStore.old_compareKV| |fam1|thrpt|5|43253529|± 
227223.861|ops/s| |
|BenchmarkForInnerStore.old_compareKV|fam1| |thrpt|5|43251637|± 
370669.972|ops/s| |
|BenchmarkForInnerStore.old_compareKV|fam1|fam1|thrpt|5|22556530|± 
131922.278|ops/s| |
|BenchmarkForInnerStore.old_compareKVVsBBKV| | |thrpt|5|27607601|± 
181466.155|ops/s| |
|BenchmarkForInnerStore.old_compareKVVsBBKV| |fam1|thrpt|5|43838946|± 
147828.804|ops/s| |
|BenchmarkForInnerStore.old_compareKVVsBBKV|fam1| |thrpt|5|43853799|± 
159898.926|ops/s| |
|BenchmarkForInnerStore.old_compareKVVsBBKV|fam1|fam1|thrpt|5|24349838|± 
233577.807|ops/s| |
|BenchmarkForNormal.new_compareBBKV|fam|fam|thrpt|5|24232184|± 
119228.533|ops/s|-1.00%|
|BenchmarkForNormal.new_compareBBKV|fam|fam1|thrpt|5|35397671|± 
87148.764|ops/s|-3.00%|
|BenchmarkForNormal.new_compareBBKV|fam1|fam|thrpt|5|34853758|± 
181728.193|ops/s|-4.00%|
|BenchmarkForNormal.new_compareBBKV|fam1|fam1|thrpt|5|23348288|± 
210662.654|ops/s|-2.00%|
|BenchmarkForNormal.new_compareKV|fam|fam|thrpt|5|23545532|± 
300722.638|ops/s|0.00%|
|BenchmarkForNormal.new_compareKV|fam|fam1|thrpt|5|36379792|± 
103787.745|ops/s|0.00%|
|BenchmarkForNormal.new_compareKV|fam1|fam|thrpt|5|36389642|± 
215220.231|ops/s|0.00%|
|BenchmarkForNormal.new_compareKV|fam1|fam1|thrpt|5|22781448|± 
334278.380|ops/s|1.00%|
|BenchmarkForNormal.new_compareKVVsBBKV|fam|fam|thrpt|5|24419066|± 
178926.313|ops/s|-3.00%|
|BenchmarkForNormal.new_compareKVVsBBKV|fam|fam1|thrpt|5|39477917|± 
116925.262|ops/s|0.00%|
|BenchmarkForNormal.new_compareKVVsBBKV|fam1|fam|thrpt|5|39381461|± 
196771.635|ops/s|0.00%|
|BenchmarkForNormal.new_compareKVVsBBKV|fam1|fam1|thrpt|5|23624400|± 
402220.882|ops/s|-3.00%|
|BenchmarkForNormal.old_compareBBKV|fam|fam|thrpt|5|24485218|± 95396.313|ops/s| 
|
|BenchmarkForNormal.old_compareBBKV|fam|fam1|thrpt|5|36419133|± 
159715.504|ops/s| |
|BenchmarkForNormal.old_compareBBKV|fam1|fam|thrpt|5|36422202|± 
86247.635|ops/s| |
|BenchmarkForNormal.old_compareBBKV|fam1|fam1|thrpt|5|23734773|± 
210072.328|ops/s| |
|BenchmarkForNormal.old_compareKV|fam|fam|thrpt|5|23534333|± 57022.884|ops/s| |
|BenchmarkForNormal.old_compareKV|fam|fam1|thrpt|5|36247387|± 95109.893|ops/s| |
|BenchmarkForNormal.old_compareKV|fam1|fam|thrpt|5|36260304|± 63266.840|ops/s| |
|BenchmarkForNormal.old_compareKV|fam1|fam1|thrpt|5|22582939|± 50450.874|ops/s| 
|
|BenchmarkForNormal.old_compareKVVsBBKV|fam|fam|thrpt|5|25144704|± 
291029.655|ops/s| |
|BenchmarkForNormal.old_compareKVVsBBKV|fam|fam1|thrpt|5|39326233|± 
218927.739|ops/s| |
|BenchmarkForNormal.old_compareKVVsBBKV|fam1|fam|thrpt|5|39297932|± 
487026.618|ops/s| |
|BenchmarkForNormal.old_compareKVVsBBKV|fam1|fam1|thrpt|5|24395274|± 
149132.118|ops/s| |


was (Author: filtertip):
Perf test report, copied from PR.(see PerfTestCellComparator.java, set 
compareCnt as 1 billion)

 
|compareMethod|leftFamLen|rightFamLen|comparator|cost(ms)|diff|
|compareKV|0|0|CellComparatorImpl|28850| |

[jira] [Commented] (HBASE-27805) The chunk created by mslab may cause memory fragement and lead to fullgc

2023-04-20 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714630#comment-17714630
 ] 

Zheng Wang commented on HBASE-27805:


Ok. [~zhangduo] 

> The chunk created by mslab may cause memory fragement and lead to fullgc
> 
>
> Key: HBASE-27805
> URL: https://issues.apache.org/jira/browse/HBASE-27805
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: chunksize-2047k.png, chunksize-2048k-fullgc.png
>
>
> The default size of chunk is 2m, when we use G1, if heapRegionSize equals 4m, 
> these chunks are allocated as humongous objects, exclusively allocating one 
> region, then the remaining 2m become memory fragement.
> Lots of memory fragement may lead to fullgc even if the percent of used heap 
> not high enough.
> I have tested to reduce the chunk size to 2047k(2m-1k, a bit lesser than half 
> of heapRegionSize), there was no repeat of the above.
> BTW, in G1, humongous objects are objects larger or equal the size of half a 
> region, and the heapRegionSize is automatically calculated based on the heap 
> size parameter if not explicitly specified.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27805) The chunk created by mslab may cause memory fragement and lead to fullgc

2023-04-19 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-27805:
---
Description: 
The default size of chunk is 2m, when we use G1, if heapRegionSize equals 4m, 
these chunks are allocated as humongous objects, exclusively allocating one 
region, then the remaining 2m become memory fragement.

Lots of memory fragement may lead to fullgc even if the percent of used heap 
not high enough.

I have tested to reduce the chunk size to 2047k(2m-1k, a bit lesser than half 
of heapRegionSize), there was no repeat of the above.

BTW, in G1, humongous objects are objects larger or equal the size of half a 
region, and the heapRegionSize is automatically calculated based on the heap 
size parameter if not explicitly specified.

 

  was:
The default size of chunk is 2m, when we use G1, if heapRegionSize equals 4m, 
these chunks are allocated as humongous objects, exclusively allocating one 
region, then the remaining 2m become memory fragement.

Lots of memory fragement may lead to fullgc even if the percent of used heap 
not high enough.

I have tested to reduce the chunk size to 2047k(2m-1k, a bit lesser than half 
of heapRegionSize), there was no repeat of the above.

BTW, in g1, humongous objects are objects larger or equal the size of half a 
region.

 


> The chunk created by mslab may cause memory fragement and lead to fullgc
> 
>
> Key: HBASE-27805
> URL: https://issues.apache.org/jira/browse/HBASE-27805
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: chunksize-2047k.png, chunksize-2048k-fullgc.png
>
>
> The default size of chunk is 2m, when we use G1, if heapRegionSize equals 4m, 
> these chunks are allocated as humongous objects, exclusively allocating one 
> region, then the remaining 2m become memory fragement.
> Lots of memory fragement may lead to fullgc even if the percent of used heap 
> not high enough.
> I have tested to reduce the chunk size to 2047k(2m-1k, a bit lesser than half 
> of heapRegionSize), there was no repeat of the above.
> BTW, in G1, humongous objects are objects larger or equal the size of half a 
> region, and the heapRegionSize is automatically calculated based on the heap 
> size parameter if not explicitly specified.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27805) The chunk created by mslab may cause memory fragement and lead to fullgc

2023-04-19 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714381#comment-17714381
 ] 

Zheng Wang commented on HBASE-27805:


Not sure if we should change the default chunk size to 2047k, any suggestions 
are welcomed.

> The chunk created by mslab may cause memory fragement and lead to fullgc
> 
>
> Key: HBASE-27805
> URL: https://issues.apache.org/jira/browse/HBASE-27805
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: chunksize-2047k.png, chunksize-2048k-fullgc.png
>
>
> The default size of chunk is 2m, when we use G1, if heapRegionSize equals 4m, 
> these chunks are allocated as humongous objects, exclusively allocating one 
> region, then the remaining 2m become memory fragement.
> Lots of memory fragement may lead to fullgc even if the percent of used heap 
> not high enough.
> I have tested to reduce the chunk size to 2047k(2m-1k, a bit lesser than half 
> of heapRegionSize), there was no repeat of the above.
> BTW, in g1, humongous objects are objects larger or equal the size of half a 
> region.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27805) The chunk created by mslab may cause memory fragement and lead to fullgc

2023-04-19 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-27805:
---
Description: 
The default size of chunk is 2m, when we use G1, if heapRegionSize equals 4m, 
these chunks are allocated as humongous objects, exclusively allocating one 
region, then the remaining 2m become memory fragement.

Lots of memory fragement may lead to fullgc even if the percent of used heap 
not high enough.

I have tested to reduce the chunk size to 2047k(2m-1k, a bit lesser than half 
of heapRegionSize), there was no repeat of the above.

BTW, in g1, humongous objects are objects larger or equal the size of half a 
region.

 

  was:
The default size of chunk is 2m, when we use G1, if heapRegionSize equals 4m, 
these chunks are allocated as humongous objects, exclusively allocating one 
region, then the remaining 2m become memory fragement.

Lots of memory fragement may leading to fullgc even if the percent of used heap 
not high enough.

I have tested to reduce the chunk size to 2047k(2m-1k, a bit lesser than half 
of heapRegionSize), there was no repeat of the above.

BTW, in g1, humongous objects are objects larger or equal the size of half a 
region.

 


> The chunk created by mslab may cause memory fragement and lead to fullgc
> 
>
> Key: HBASE-27805
> URL: https://issues.apache.org/jira/browse/HBASE-27805
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: chunksize-2047k.png, chunksize-2048k-fullgc.png
>
>
> The default size of chunk is 2m, when we use G1, if heapRegionSize equals 4m, 
> these chunks are allocated as humongous objects, exclusively allocating one 
> region, then the remaining 2m become memory fragement.
> Lots of memory fragement may lead to fullgc even if the percent of used heap 
> not high enough.
> I have tested to reduce the chunk size to 2047k(2m-1k, a bit lesser than half 
> of heapRegionSize), there was no repeat of the above.
> BTW, in g1, humongous objects are objects larger or equal the size of half a 
> region.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27805) The chunk created by mslab may cause memory fragement and lead to fullgc

2023-04-19 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-27805:
---
Summary: The chunk created by mslab may cause memory fragement and lead to 
fullgc  (was: The chunk created by mslab may cause memory fragement and leading 
to fullgc)

> The chunk created by mslab may cause memory fragement and lead to fullgc
> 
>
> Key: HBASE-27805
> URL: https://issues.apache.org/jira/browse/HBASE-27805
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: chunksize-2047k.png, chunksize-2048k-fullgc.png
>
>
> The default size of chunk is 2m, when we use G1, if heapRegionSize equals 4m, 
> these chunks are allocated as humongous objects, exclusively allocating one 
> region, then the remaining 2m become memory fragement.
> Lots of memory fragement may leading to fullgc even if the percent of used 
> heap not high enough.
> I have tested to reduce the chunk size to 2047k(2m-1k, a bit lesser than half 
> of heapRegionSize), there was no repeat of the above.
> BTW, in g1, humongous objects are objects larger or equal the size of half a 
> region.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27805) The chunk created by mslab may cause memory fragement and leading to fullgc

2023-04-19 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-27805:
---
Description: 
The default size of chunk is 2m, when we use G1, if heapRegionSize equals 4m, 
these chunks are allocated as humongous objects, exclusively allocating one 
region, then the remaining 2m become memory fragement.

Lots of memory fragement may leading to fullgc even if the percent of used heap 
not high enough.

I have tested to reduce the chunk size to 2047k(2m-1k, a bit lesser than half 
of heapRegionSize), there was no repeat of the above.

BTW, in g1, humongous objects are objects larger or equal the size of half a 
region.

 

  was:
The default size of chunk is 2MB, when we use G1, if heapRegionSize equals 4MB, 
these chunks are allocated as humongous objects, exclusively allocating one 
region, then the remaining 2MB become memory fragement.

Lots of memory fragement may leading to fullgc even if the percent of used heap 
not high enough.

I have tested to reduce the chunk size to 2047k, there was no repeat of the 
above.


> The chunk created by mslab may cause memory fragement and leading to fullgc
> ---
>
> Key: HBASE-27805
> URL: https://issues.apache.org/jira/browse/HBASE-27805
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: chunksize-2047k.png, chunksize-2048k-fullgc.png
>
>
> The default size of chunk is 2m, when we use G1, if heapRegionSize equals 4m, 
> these chunks are allocated as humongous objects, exclusively allocating one 
> region, then the remaining 2m become memory fragement.
> Lots of memory fragement may leading to fullgc even if the percent of used 
> heap not high enough.
> I have tested to reduce the chunk size to 2047k(2m-1k, a bit lesser than half 
> of heapRegionSize), there was no repeat of the above.
> BTW, in g1, humongous objects are objects larger or equal the size of half a 
> region.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27805) The chunk created by mslab may cause memory fragement and leading to fullgc

2023-04-19 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-27805:
---
Description: 
The default size of chunk is 2MB, when we use G1, if heapRegionSize equals 4MB, 
these chunks are allocated as humongous objects, exclusively allocating one 
region, then the remaining 2MB become memory fragement.

Lots of memory fragement may leading to fullgc even if the percent of used heap 
not high enough.

I have tested to reduce the chunk size to 2047k, there was no repeat of the 
above.

  was:
The default size of chunk is 2MB, when we use G1, if heapRegionSize equals 4MB, 
these chunks are allocated as humongous objects, exclusively allocating one 
region, then the remaining 2MB become memory fragement.

Lots of memory fragement may leading to fullgc even if the percent of used heap 
not high enough.


> The chunk created by mslab may cause memory fragement and leading to fullgc
> ---
>
> Key: HBASE-27805
> URL: https://issues.apache.org/jira/browse/HBASE-27805
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: chunksize-2047k.png, chunksize-2048k-fullgc.png
>
>
> The default size of chunk is 2MB, when we use G1, if heapRegionSize equals 
> 4MB, these chunks are allocated as humongous objects, exclusively allocating 
> one region, then the remaining 2MB become memory fragement.
> Lots of memory fragement may leading to fullgc even if the percent of used 
> heap not high enough.
> I have tested to reduce the chunk size to 2047k, there was no repeat of the 
> above.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27805) The chunk created by mslab may cause memory fragement and leading to fullgc

2023-04-19 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-27805:
---
Attachment: chunksize-2047k.png

> The chunk created by mslab may cause memory fragement and leading to fullgc
> ---
>
> Key: HBASE-27805
> URL: https://issues.apache.org/jira/browse/HBASE-27805
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: chunksize-2047k.png, chunksize-2048k-fullgc.png
>
>
> The default size of chunk is 2MB, when we use G1, if heapRegionSize equals 
> 4MB, these chunks are allocated as humongous objects, exclusively allocating 
> one region, then the remaining 2MB become memory fragement.
> Lots of memory fragement may leading to fullgc even if the percent of used 
> heap not high enough.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27805) The chunk created by mslab may cause memory fragement and leading to fullgc

2023-04-19 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-27805:
---
Attachment: chunksize-2048k-fullgc.png

> The chunk created by mslab may cause memory fragement and leading to fullgc
> ---
>
> Key: HBASE-27805
> URL: https://issues.apache.org/jira/browse/HBASE-27805
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: chunksize-2048k-fullgc.png
>
>
> The default size of chunk is 2MB, when we use G1, if heapRegionSize equals 
> 4MB, these chunks are allocated as humongous objects, exclusively allocating 
> one region, then the remaining 2MB become memory fragement.
> Lots of memory fragement may leading to fullgc even if the percent of used 
> heap not high enough.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27805) The chunk created by mslab may cause memory fragement and leading to fullgc

2023-04-19 Thread Zheng Wang (Jira)
Zheng Wang created HBASE-27805:
--

 Summary: The chunk created by mslab may cause memory fragement and 
leading to fullgc
 Key: HBASE-27805
 URL: https://issues.apache.org/jira/browse/HBASE-27805
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Zheng Wang
Assignee: Zheng Wang


The default size of chunk is 2MB, when we use G1, if heapRegionSize equals 4MB, 
these chunks are allocated as humongous objects, exclusively allocating one 
region, then the remaining 2MB become memory fragement.

Lots of memory fragement may leading to fullgc even if the percent of used heap 
not high enough.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27788) Skip family comparing when compare cells inner the store

2023-04-14 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712587#comment-17712587
 ] 

Zheng Wang commented on HBASE-27788:


A writing test case with PE:

nohup hbase pe --table=TestTable1 --nomapred --oneCon=true --valueSize=10 
--rows=100 --columns=200 --autoFlush=true   --presplit=10 --multiPut=100 
--writeToWAL=false  sequentialWrite 1 2>&1 > nohup.out &

//before patch
Finished TestClient-0 in 396657ms over 100 rows

//after patch
Finished TestClient-0 in 356932ms over 100 rows

> Skip family comparing when compare cells inner the store
> 
>
> Key: HBASE-27788
> URL: https://issues.apache.org/jira/browse/HBASE-27788
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> Currently we use CellComparatorImpl to compare cells, it compare row first, 
> then family, then qulifier and so on.
> If the comparing inner the store, the families are always equal(unless the 
> familyLength is zero for special purpose), so this step could be skipped for 
> better performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-27788) Skip family comparing when compare cells inner the store

2023-04-14 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712226#comment-17712226
 ] 

Zheng Wang edited comment on HBASE-27788 at 4/14/23 7:50 AM:
-

Perf test report, copied from PR.(see PerfTestCellComparator.java, set 
compareCnt as 1 billion)

 
|compareMethod|leftFamLen|rightFamLen|comparator|cost(ms)|diff|
|compareKV|0|0|CellComparatorImpl|28850| |
|compareKV|0|0|InnerStoreCellComparator|27478|-5.00%|
|compareKV|0|4|CellComparatorImpl|19041| |
|compareKV|0|4|InnerStoreCellComparator|17391|-9.00%|
|compareKV|4|0|CellComparatorImpl|18988| |
|compareKV|4|0|InnerStoreCellComparator|17375|-8.00%|
|compareKV|4|4|CellComparatorImpl|33360| |
|compareKV|4|4|InnerStoreCellComparator|27083|-19.00%|
|compareBBKV|0|0|CellComparatorImpl|34014| |
|compareBBKV|0|0|InnerStoreCellComparator|31660|-7.00%|
|compareBBKV|0|4|CellComparatorImpl|20780| |
|compareBBKV|0|4|InnerStoreCellComparator|20847|0.00%|
|compareBBKV|4|0|CellComparatorImpl|23540| |
|compareBBKV|4|0|InnerStoreCellComparator|21751|-8.00%|
|compareBBKV|4|4|CellComparatorImpl|40192| |
|compareBBKV|4|4|InnerStoreCellComparator|31522|-22.00%|
|compareKVVsBBKV|0|0|CellComparatorImpl|30979| |
|compareKVVsBBKV|0|0|InnerStoreCellComparator|29827|-4.00%|
|compareKVVsBBKV|0|4|CellComparatorImpl|21918| |
|compareKVVsBBKV|0|4|InnerStoreCellComparator|19143|-13.00%|
|compareKVVsBBKV|4|0|CellComparatorImpl|22605| |
|compareKVVsBBKV|4|0|InnerStoreCellComparator|20952|-7.00%|
|compareKVVsBBKV|4|4|CellComparatorImpl|35561| |
|compareKVVsBBKV|4|4|InnerStoreCellComparator|29150|-18.00%|


was (Author: filtertip):
Perf test report(set compareCnt as 1 billion), copied from PR.

 
|compareMethod|leftFamLen|rightFamLen|comparator|cost(ms)|diff|
|compareKV|0|0|CellComparatorImpl|28850| |
|compareKV|0|0|InnerStoreCellComparator|27478|-5.00%|
|compareKV|0|4|CellComparatorImpl|19041| |
|compareKV|0|4|InnerStoreCellComparator|17391|-9.00%|
|compareKV|4|0|CellComparatorImpl|18988| |
|compareKV|4|0|InnerStoreCellComparator|17375|-8.00%|
|compareKV|4|4|CellComparatorImpl|33360| |
|compareKV|4|4|InnerStoreCellComparator|27083|-19.00%|
|compareBBKV|0|0|CellComparatorImpl|34014| |
|compareBBKV|0|0|InnerStoreCellComparator|31660|-7.00%|
|compareBBKV|0|4|CellComparatorImpl|20780| |
|compareBBKV|0|4|InnerStoreCellComparator|20847|0.00%|
|compareBBKV|4|0|CellComparatorImpl|23540| |
|compareBBKV|4|0|InnerStoreCellComparator|21751|-8.00%|
|compareBBKV|4|4|CellComparatorImpl|40192| |
|compareBBKV|4|4|InnerStoreCellComparator|31522|-22.00%|
|compareKVVsBBKV|0|0|CellComparatorImpl|30979| |
|compareKVVsBBKV|0|0|InnerStoreCellComparator|29827|-4.00%|
|compareKVVsBBKV|0|4|CellComparatorImpl|21918| |
|compareKVVsBBKV|0|4|InnerStoreCellComparator|19143|-13.00%|
|compareKVVsBBKV|4|0|CellComparatorImpl|22605| |
|compareKVVsBBKV|4|0|InnerStoreCellComparator|20952|-7.00%|
|compareKVVsBBKV|4|4|CellComparatorImpl|35561| |
|compareKVVsBBKV|4|4|InnerStoreCellComparator|29150|-18.00%|

> Skip family comparing when compare cells inner the store
> 
>
> Key: HBASE-27788
> URL: https://issues.apache.org/jira/browse/HBASE-27788
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> Currently we use CellComparatorImpl to compare cells, it compare row first, 
> then family, then qulifier and so on.
> If the comparing inner the store, the families are always equal(unless the 
> familyLength is zero for special purpose), so this step could be skipped for 
> better performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27788) Skip family comparing when compare cells inner the store

2023-04-14 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712226#comment-17712226
 ] 

Zheng Wang commented on HBASE-27788:


Perf test report(set compareCnt as 1 billion), copied from PR.

 
|compareMethod|leftFamLen|rightFamLen|comparator|cost(ms)|diff|
|compareKV|0|0|CellComparatorImpl|28850| |
|compareKV|0|0|InnerStoreCellComparator|27478|-5.00%|
|compareKV|0|4|CellComparatorImpl|19041| |
|compareKV|0|4|InnerStoreCellComparator|17391|-9.00%|
|compareKV|4|0|CellComparatorImpl|18988| |
|compareKV|4|0|InnerStoreCellComparator|17375|-8.00%|
|compareKV|4|4|CellComparatorImpl|33360| |
|compareKV|4|4|InnerStoreCellComparator|27083|-19.00%|
|compareBBKV|0|0|CellComparatorImpl|34014| |
|compareBBKV|0|0|InnerStoreCellComparator|31660|-7.00%|
|compareBBKV|0|4|CellComparatorImpl|20780| |
|compareBBKV|0|4|InnerStoreCellComparator|20847|0.00%|
|compareBBKV|4|0|CellComparatorImpl|23540| |
|compareBBKV|4|0|InnerStoreCellComparator|21751|-8.00%|
|compareBBKV|4|4|CellComparatorImpl|40192| |
|compareBBKV|4|4|InnerStoreCellComparator|31522|-22.00%|
|compareKVVsBBKV|0|0|CellComparatorImpl|30979| |
|compareKVVsBBKV|0|0|InnerStoreCellComparator|29827|-4.00%|
|compareKVVsBBKV|0|4|CellComparatorImpl|21918| |
|compareKVVsBBKV|0|4|InnerStoreCellComparator|19143|-13.00%|
|compareKVVsBBKV|4|0|CellComparatorImpl|22605| |
|compareKVVsBBKV|4|0|InnerStoreCellComparator|20952|-7.00%|
|compareKVVsBBKV|4|4|CellComparatorImpl|35561| |
|compareKVVsBBKV|4|4|InnerStoreCellComparator|29150|-18.00%|

> Skip family comparing when compare cells inner the store
> 
>
> Key: HBASE-27788
> URL: https://issues.apache.org/jira/browse/HBASE-27788
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> Currently we use CellComparatorImpl to compare cells, it compare row first, 
> then family, then qulifier and so on.
> If the comparing inner the store, the families are always equal(unless the 
> familyLength is zero for special purpose), so this step could be skipped for 
> better performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27788) Skip family comparing when compare cells inner the store

2023-04-11 Thread Zheng Wang (Jira)
Zheng Wang created HBASE-27788:
--

 Summary: Skip family comparing when compare cells inner the store
 Key: HBASE-27788
 URL: https://issues.apache.org/jira/browse/HBASE-27788
 Project: HBase
  Issue Type: Improvement
  Components: Performance
Reporter: Zheng Wang
Assignee: Zheng Wang


Currently we use CellComparatorImpl to compare cells, it compare row first, 
then family, then qulifier and so on.

If the comparing inner the store, the families are always equal(unless the 
familyLength is zero for special purpose), so this step could be skipped for 
better performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27765) Add biggest cell related info into web ui

2023-04-05 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17708982#comment-17708982
 ] 

Zheng Wang commented on HBASE-27765:


Filled the release note yet.

Thanks a lot for the review and push. [~zhangduo] 

> Add biggest cell related info into web ui
> -
>
> Key: HBASE-27765
> URL: https://issues.apache.org/jira/browse/HBASE-27765
> Project: HBase
>  Issue Type: Improvement
>  Components: HFile, UI
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> There are some disadvantages to large cell, such as can't be cached or cause 
> memory fragmentation, but currently user can't easily to find them out.
> My proposal is save len and key of the biggest cell into fileinfo of hfile, 
> and shown on web ui, including two places.
> 1: Add "Len Of Biggest Cell" into main page of regionServer, in here we can 
> find out which regions has large cell by sorting.
> 2: Add "Len Of Biggest Cell" and "Key Of Biggest Cell" into region page, in 
> here we can find out the exactly key and the hfile.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27765) Add biggest cell related info into web ui

2023-04-05 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-27765:
---
Release Note: Save len and key of the biggest cell into fileinfo when 
generate hfile, and shows them on webui for better monitor.

> Add biggest cell related info into web ui
> -
>
> Key: HBASE-27765
> URL: https://issues.apache.org/jira/browse/HBASE-27765
> Project: HBase
>  Issue Type: Improvement
>  Components: HFile, UI
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> There are some disadvantages to large cell, such as can't be cached or cause 
> memory fragmentation, but currently user can't easily to find them out.
> My proposal is save len and key of the biggest cell into fileinfo of hfile, 
> and shown on web ui, including two places.
> 1: Add "Len Of Biggest Cell" into main page of regionServer, in here we can 
> find out which regions has large cell by sorting.
> 2: Add "Len Of Biggest Cell" and "Key Of Biggest Cell" into region page, in 
> here we can find out the exactly key and the hfile.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27765) Add biggest cell related info into web ui

2023-03-29 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-27765:
---
Fix Version/s: 3.0.0-alpha-4

> Add biggest cell related info into web ui
> -
>
> Key: HBASE-27765
> URL: https://issues.apache.org/jira/browse/HBASE-27765
> Project: HBase
>  Issue Type: Improvement
>  Components: HFile, UI
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 3.0.0-alpha-4
>
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> There are some disadvantages to large cell, such as can't be cached or cause 
> memory fragmentation, but currently user can't easily to find them out.
> My proposal is save len and key of the biggest cell into fileinfo of hfile, 
> and shown on web ui, including two places.
> 1: Add "Len Of Biggest Cell" into main page of regionServer, in here we can 
> find out which regions has large cell by sorting.
> 2: Add "Len Of Biggest Cell" and "Key Of Biggest Cell" into region page, in 
> here we can find out the exactly key and the hfile.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27765) Add biggest cell related info into web ui

2023-03-29 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-27765:
---
Summary: Add biggest cell related info into web ui  (was: Add biggest cell 
related info web ui)

> Add biggest cell related info into web ui
> -
>
> Key: HBASE-27765
> URL: https://issues.apache.org/jira/browse/HBASE-27765
> Project: HBase
>  Issue Type: Improvement
>  Components: HFile, UI
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> There are some disadvantages to large cell, such as can't be cached or cause 
> memory fragmentation, but currently user can't easily to find them out.
> My proposal is save len and key of the biggest cell into fileinfo of hfile, 
> and shown on web ui, including two places.
> 1: Add "Len Of Biggest Cell" into main page of regionServer, in here we can 
> find out which regions has large cell by sorting.
> 2: Add "Len Of Biggest Cell" and "Key Of Biggest Cell" into region page, in 
> here we can find out the exactly key and the hfile.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27765) Add biggest cell related info web ui

2023-03-29 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-27765:
---
Attachment: screenshot-2.png

> Add biggest cell related info web ui
> 
>
> Key: HBASE-27765
> URL: https://issues.apache.org/jira/browse/HBASE-27765
> Project: HBase
>  Issue Type: Improvement
>  Components: HFile, UI
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> There are some disadvantages to large cell, such as can't be cached or cause 
> memory fragmentation, but currently user can't easily to find them out.
> My proposal is save len and key of the biggest cell into fileinfo of hfile, 
> and shown on web ui, including two places.
> 1: Add "Len Of Biggest Cell" into main page of regionServer, in here we can 
> find out which regions has large cell by sorting.
> 2: Add "Len Of Biggest Cell" and "Key Of Biggest Cell" into region page, in 
> here we can find out the exactly key and the hfile.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27765) Add biggest cell related info web ui

2023-03-29 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-27765:
---
Attachment: screenshot-1.png

> Add biggest cell related info web ui
> 
>
> Key: HBASE-27765
> URL: https://issues.apache.org/jira/browse/HBASE-27765
> Project: HBase
>  Issue Type: Improvement
>  Components: HFile, UI
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> There are some disadvantages to large cell, such as can't be cached or cause 
> memory fragmentation, but currently user can't easily to find them out.
> My proposal is save len and key of the biggest cell into fileinfo of hfile, 
> and shown on web ui, including two places.
> 1: Add "Len Of Biggest Cell" into main page of regionServer, in here we can 
> find out which regions has large cell by sorting.
> 2: Add "Len Of Biggest Cell" and "Key Of Biggest Cell" into region page, in 
> here we can find out the exactly key and the hfile.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27765) Add biggest cell related info web ui

2023-03-29 Thread Zheng Wang (Jira)
Zheng Wang created HBASE-27765:
--

 Summary: Add biggest cell related info web ui
 Key: HBASE-27765
 URL: https://issues.apache.org/jira/browse/HBASE-27765
 Project: HBase
  Issue Type: Improvement
  Components: HFile, UI
Reporter: Zheng Wang
Assignee: Zheng Wang


There are some disadvantages to large cell, such as can't be cached or cause 
memory fragmentation, but currently user can't easily to find them out.

My proposal is save len and key of the biggest cell into fileinfo of hfile, and 
shown on web ui, including two places.

1: Add "Len Of Biggest Cell" into main page of regionServer, in here we can 
find out which regions has large cell by sorting.

2: Add "Len Of Biggest Cell" and "Key Of Biggest Cell" into region page, in 
here we can find out the exactly key and the hfile.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-25768) Support an overall coarse and fast balance strategy for StochasticLoadBalancer

2022-05-08 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533612#comment-17533612
 ] 

Zheng Wang commented on HBASE-25768:


[~Xiaolin Ha] Yeah, you are right, this patch is useful for some cases.

> Support an overall coarse and fast balance strategy for StochasticLoadBalancer
> --
>
> Key: HBASE-25768
> URL: https://issues.apache.org/jira/browse/HBASE-25768
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 3.0.0-alpha-1, 2.0.0, 1.4.13
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
>
> When we use StochasticLoadBalancer + balanceByTable, we could face two 
> difficulties.
>  # For each table, their regions are distributed uniformly, but for the 
> overall cluster, still exiting imbalance between RSes;
>  # When there are large-scaled restart of RSes, or expansion for groups or 
> cluster, we hope the balancer can execute as soon as possible, but the 
> StochasticLoadBalancer may need a lot of time to compute costs.
> We can detect these circumstances in StochasticLoadBalancer(such as using the 
> percentage of skew tables), and before the normal balance steps trying, we 
> can add a strategy to let it just balance like the SimpleLoadBalancer or use 
> few light cost functions here.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-26987) The length of compact queue grows too big when the compacting is slow

2022-05-04 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26987:
---
External issue URL:   (was: 
https://issues.apache.org/jira/browse/HBASE-8665)

> The length of compact queue grows too big when the compacting is slow
> -
>
> Key: HBASE-26987
> URL: https://issues.apache.org/jira/browse/HBASE-26987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: image-2022-04-29-10-26-09-351.png, 
> image-2022-04-29-10-26-18-323.png, image-2022-04-29-10-26-24-087.png
>
>
> For some system compaction, we set the selectNow to false, so the file 
> selecting will not be done until the compaction running, it brings side 
> effect, if another compacting is slow, we may put lots of compaction to 
> queue, because the filesCompacting of Hstore is empty in the meantime.
> An example shows at attachments, there are 154 regions and about 2000 hfiles, 
> but the length of compact queue grows to 1391, it cause confusion and may 
> trigger unexpected alarm.
> My approach is limit the compaction queue count, by compute the 
> filesNotCompating and hbase.hstore.compaction.max.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-26987) The length of compact queue grows too big when the compacting is slow

2022-05-04 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26987:
---
External issue URL: https://issues.apache.org/jira/browse/HBASE-8665

> The length of compact queue grows too big when the compacting is slow
> -
>
> Key: HBASE-26987
> URL: https://issues.apache.org/jira/browse/HBASE-26987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: image-2022-04-29-10-26-09-351.png, 
> image-2022-04-29-10-26-18-323.png, image-2022-04-29-10-26-24-087.png
>
>
> For some system compaction, we set the selectNow to false, so the file 
> selecting will not be done until the compaction running, it brings side 
> effect, if another compacting is slow, we may put lots of compaction to 
> queue, because the filesCompacting of Hstore is empty in the meantime.
> An example shows at attachments, there are 154 regions and about 2000 hfiles, 
> but the length of compact queue grows to 1391, it cause confusion and may 
> trigger unexpected alarm.
> My approach is limit the compaction queue count, by compute the 
> filesNotCompating and hbase.hstore.compaction.max.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-26987) The length of compact queue grows too big when the compacting is slow

2022-05-04 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17531999#comment-17531999
 ] 

Zheng Wang commented on HBASE-26987:


BTW, this issue seems introduced by HBASE-8665.

> The length of compact queue grows too big when the compacting is slow
> -
>
> Key: HBASE-26987
> URL: https://issues.apache.org/jira/browse/HBASE-26987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: image-2022-04-29-10-26-09-351.png, 
> image-2022-04-29-10-26-18-323.png, image-2022-04-29-10-26-24-087.png
>
>
> For some system compaction, we set the selectNow to false, so the file 
> selecting will not be done until the compaction running, it brings side 
> effect, if another compacting is slow, we may put lots of compaction to 
> queue, because the filesCompacting of Hstore is empty in the meantime.
> An example shows at attachments, there are 154 regions and about 2000 hfiles, 
> but the length of compact queue grows to 1391, it cause confusion and may 
> trigger unexpected alarm.
> My approach is limit the compaction queue count, by compute the 
> filesNotCompating and hbase.hstore.compaction.max.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] (HBASE-26987) The length of compact queue grows too big when the compacting is slow

2022-05-04 Thread Zheng Wang (Jira)


[ https://issues.apache.org/jira/browse/HBASE-26987 ]


Zheng Wang deleted comment on HBASE-26987:


was (Author: filtertip):
Before this issue resolved, i think we should have a config to disable this 
feature. 

> The length of compact queue grows too big when the compacting is slow
> -
>
> Key: HBASE-26987
> URL: https://issues.apache.org/jira/browse/HBASE-26987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: image-2022-04-29-10-26-09-351.png, 
> image-2022-04-29-10-26-18-323.png, image-2022-04-29-10-26-24-087.png
>
>
> For some system compaction, we set the selectNow to false, so the file 
> selecting will not be done until the compaction running, it brings side 
> effect, if another compacting is slow, we may put lots of compaction to 
> queue, because the filesCompacting of Hstore is empty in the meantime.
> An example shows at attachments, there are 154 regions and about 2000 hfiles, 
> but the length of compact queue grows to 1391, it cause confusion and may 
> trigger unexpected alarm.
> My approach is limit the compaction queue count, by compute the 
> filesNotCompating and hbase.hstore.compaction.max.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-26987) The length of compact queue grows too big when the compacting is slow

2022-04-30 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26987:
---
Description: 
For some system compaction, we set the selectNow to false, so the file 
selecting will not be done until the compaction running, it brings side effect, 
if another compacting is slow, we may put lots of compaction to queue, because 
the filesCompacting of Hstore is empty in the meantime.

An example shows at attachments, there are 154 regions and about 2000 hfiles, 
but the length of compact queue grows to 1391, it cause confusion and may 
trigger unexpected alarm.

My approach is limit the compaction queue count, by compute the 
filesNotCompating and hbase.hstore.compaction.max.

  was:
For some system compaction, we set the selectNow to false, so the file 
selecting will not be done until the compaction running, it brings side effect, 
if another compacting is slow, we may put lots of compaction to queue, because 
the filesCompacting of Hstore is empty in the meantime.

An example shows at attachments, there are 154 regions and about 2000 hfiles, 
but the length of compact queue grows to 1391, it cause confusion and may 
trigger unexpected alarm.


> The length of compact queue grows too big when the compacting is slow
> -
>
> Key: HBASE-26987
> URL: https://issues.apache.org/jira/browse/HBASE-26987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: image-2022-04-29-10-26-09-351.png, 
> image-2022-04-29-10-26-18-323.png, image-2022-04-29-10-26-24-087.png
>
>
> For some system compaction, we set the selectNow to false, so the file 
> selecting will not be done until the compaction running, it brings side 
> effect, if another compacting is slow, we may put lots of compaction to 
> queue, because the filesCompacting of Hstore is empty in the meantime.
> An example shows at attachments, there are 154 regions and about 2000 hfiles, 
> but the length of compact queue grows to 1391, it cause confusion and may 
> trigger unexpected alarm.
> My approach is limit the compaction queue count, by compute the 
> filesNotCompating and hbase.hstore.compaction.max.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HBASE-26987) The length of compact queue grows too big when the compacting is slow

2022-04-29 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang reassigned HBASE-26987:
--

Assignee: Zheng Wang

> The length of compact queue grows too big when the compacting is slow
> -
>
> Key: HBASE-26987
> URL: https://issues.apache.org/jira/browse/HBASE-26987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: image-2022-04-29-10-26-09-351.png, 
> image-2022-04-29-10-26-18-323.png, image-2022-04-29-10-26-24-087.png
>
>
> For some system compaction, we set the selectNow to false, so the file 
> selecting will not be done until the compaction running, it brings side 
> effect, if another compacting is slow, we may put lots of compaction to 
> queue, because the filesCompacting of Hstore is empty in the meantime.
> An example shows at attachments, there are 154 regions and about 2000 hfiles, 
> but the length of compact queue grows to 1391, it cause confusion and may 
> trigger unexpected alarm.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-26987) The length of compact queue grows too big when the compacting is slow

2022-04-29 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26987:
---
Description: 
For some system compaction, we set the selectNow to false, so the file 
selecting will not be done until the compaction running, it brings side effect, 
if another compacting is slow, we may put lots of compaction to queue, because 
the filesCompacting of Hstore is empty in the meantime.

An example shows at attachments, there are 154 regions and about 2000 hfiles, 
but the length of compact queue grows to 1391, it cause confusion and may 
trigger unexpected alarm.

  was:
For some system compaction, we set the selectNow to false, so the file 
selecting will not be done until the compaction running, it brings side effect, 
if another compacting is slow, we may put lots of compaction to queue, because 
the filesCompacting of Hstore is empty in the meantime.

An example shows at attachments, there are 154 regions and about 2000 hfiles, 
but the length of compact queue grows to 1391, it cause confusion and may 
trigger wrong alarm.


> The length of compact queue grows too big when the compacting is slow
> -
>
> Key: HBASE-26987
> URL: https://issues.apache.org/jira/browse/HBASE-26987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Wang
>Priority: Major
> Attachments: image-2022-04-29-10-26-09-351.png, 
> image-2022-04-29-10-26-18-323.png, image-2022-04-29-10-26-24-087.png
>
>
> For some system compaction, we set the selectNow to false, so the file 
> selecting will not be done until the compaction running, it brings side 
> effect, if another compacting is slow, we may put lots of compaction to 
> queue, because the filesCompacting of Hstore is empty in the meantime.
> An example shows at attachments, there are 154 regions and about 2000 hfiles, 
> but the length of compact queue grows to 1391, it cause confusion and may 
> trigger unexpected alarm.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-26987) The length of compact queue grows too big when the compacting is slow

2022-04-29 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26987:
---
Description: 
For some system compaction, we set the selectNow to false, so the file 
selecting will not be done until the compaction running, it brings side effect, 
if another compacting is slow, we may put lots of compaction to queue, because 
the filesCompacting of Hstore is empty in the meantime.

An example shows at attachments, there are 154 regions and about 2000 hfiles, 
but the length of compact queue grows to 1391, it cause confusion and may 
trigger wrong alarm.

  was:
For some system compaction, we set the selectNow to false, so the file 
selecting will be done until the compaction running, but it brings a side 
effect, if another compacting is slow, we may put lots of compaction to queue, 
because the filesCompacting of Hstore is empty in the meantime.

An example shows at attachments, there are 154 regions and about 2000 hfiles, 
but the length of compact queue grows to 1391, it cause confusion and may 
trigger wrong alarm.


> The length of compact queue grows too big when the compacting is slow
> -
>
> Key: HBASE-26987
> URL: https://issues.apache.org/jira/browse/HBASE-26987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Wang
>Priority: Major
> Attachments: image-2022-04-29-10-26-09-351.png, 
> image-2022-04-29-10-26-18-323.png, image-2022-04-29-10-26-24-087.png
>
>
> For some system compaction, we set the selectNow to false, so the file 
> selecting will not be done until the compaction running, it brings side 
> effect, if another compacting is slow, we may put lots of compaction to 
> queue, because the filesCompacting of Hstore is empty in the meantime.
> An example shows at attachments, there are 154 regions and about 2000 hfiles, 
> but the length of compact queue grows to 1391, it cause confusion and may 
> trigger wrong alarm.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-26987) The length of compact queue grows too big when the compacting is slow

2022-04-29 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26987:
---
Summary: The length of compact queue grows too big when the compacting is 
slow  (was: The length of compact queue too big when the compacting is slow)

> The length of compact queue grows too big when the compacting is slow
> -
>
> Key: HBASE-26987
> URL: https://issues.apache.org/jira/browse/HBASE-26987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Wang
>Priority: Major
> Attachments: image-2022-04-29-10-26-09-351.png, 
> image-2022-04-29-10-26-18-323.png, image-2022-04-29-10-26-24-087.png
>
>
> For some system compaction, we set the selectNow to false, so the file 
> selecting will be done until the compaction running, but it brings a side 
> effect, if another compacting is slow, we may put lots of compaction to 
> queue, because the filesCompacting of Hstore is empty in the meantime.
> An example shows at attachments, there are 154 regions and about 2000 hfiles, 
> but the length of compact queue grows to 1391, it cause confusion and may 
> trigger wrong alarm.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-25768) Support an overall coarse and fast balance strategy for StochasticLoadBalancer

2022-04-28 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529777#comment-17529777
 ] 

Zheng Wang commented on HBASE-25768:


I encountered similar issue recently, a cluster has 1000+ table, when i enable 
balanceByTable, it spend several hours to do the balance, finally i disable it, 
and set hbase.master.balancer.stochastic.tableSkewCost to 1000 instead, it 
works well.

 

> Support an overall coarse and fast balance strategy for StochasticLoadBalancer
> --
>
> Key: HBASE-25768
> URL: https://issues.apache.org/jira/browse/HBASE-25768
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 3.0.0-alpha-1, 2.0.0, 1.4.13
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Major
>
> When we use StochasticLoadBalancer + balanceByTable, we could face two 
> difficulties.
>  # For each table, their regions are distributed uniformly, but for the 
> overall cluster, still exiting imbalance between RSes;
>  # When there are large-scaled restart of RSes, or expansion for groups or 
> cluster, we hope the balancer can execute as soon as possible, but the 
> StochasticLoadBalancer may need a lot of time to compute costs.
> We can detect these circumstances in StochasticLoadBalancer(such as using the 
> percentage of skew tables), and before the normal balance steps trying, we 
> can add a strategy to let it just balance like the SimpleLoadBalancer or use 
> few light cost functions here.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-26987) The length of compact queue too big when the compacting is slow

2022-04-28 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26987:
---
Description: 
For some system compaction, we set the selectNow to false, so the file 
selecting will be done until the compaction running, but it brings a side 
effect, if another compacting is slow, we may put lots of compaction to queue, 
because the filesCompacting of Hstore is empty in the meantime.

An example shows at attachments, there are 154 regions and about 2000 hfiles, 
but the length of compact queue grows to 1391, it cause confusion and may 
trigger wrong alarm.

  was:For some system compaction, we set the selectNow to false, so the file 
selecting will be done until the compaction running, but it brings a side 
effect, if another compacting is slow, we may put lots of compaction to queue, 
because the filesCompacting of Hstore is empty in the meantime.


> The length of compact queue too big when the compacting is slow
> ---
>
> Key: HBASE-26987
> URL: https://issues.apache.org/jira/browse/HBASE-26987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Wang
>Priority: Major
> Attachments: image-2022-04-29-10-26-09-351.png, 
> image-2022-04-29-10-26-18-323.png, image-2022-04-29-10-26-24-087.png
>
>
> For some system compaction, we set the selectNow to false, so the file 
> selecting will be done until the compaction running, but it brings a side 
> effect, if another compacting is slow, we may put lots of compaction to 
> queue, because the filesCompacting of Hstore is empty in the meantime.
> An example shows at attachments, there are 154 regions and about 2000 hfiles, 
> but the length of compact queue grows to 1391, it cause confusion and may 
> trigger wrong alarm.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-26987) The length of compact queue too big when the compacting is slow

2022-04-28 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529724#comment-17529724
 ] 

Zheng Wang commented on HBASE-26987:


Before this issue resolved, i think we should have a config to disable this 
feature. 

> The length of compact queue too big when the compacting is slow
> ---
>
> Key: HBASE-26987
> URL: https://issues.apache.org/jira/browse/HBASE-26987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Wang
>Priority: Major
> Attachments: image-2022-04-29-10-26-09-351.png, 
> image-2022-04-29-10-26-18-323.png, image-2022-04-29-10-26-24-087.png
>
>
> For some system compaction, we set the selectNow to false, so the file 
> selecting will be done until the compaction running, but it brings a side 
> effect, if another compacting is slow, we may put lots of compaction to 
> queue, because the filesCompacting of Hstore is empty in the meantime.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (HBASE-26987) The length of compact queue too big when the compacting is slow

2022-04-28 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26987:
---
Summary: The length of compact queue too big when the compacting is slow  
(was: The length of compact queue will be wrong when the compacting is slow)

> The length of compact queue too big when the compacting is slow
> ---
>
> Key: HBASE-26987
> URL: https://issues.apache.org/jira/browse/HBASE-26987
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Wang
>Priority: Major
> Attachments: image-2022-04-29-10-26-09-351.png, 
> image-2022-04-29-10-26-18-323.png, image-2022-04-29-10-26-24-087.png
>
>
> For some system compaction, we set the selectNow to false, so the file 
> selecting will be done until the compaction running, but it brings a side 
> effect, if another compacting is slow, we may put lots of compaction to 
> queue, because the filesCompacting of Hstore is empty in the meantime.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-26987) The length of compact queue will be wrong when the compacting is slow

2022-04-28 Thread Zheng Wang (Jira)
Zheng Wang created HBASE-26987:
--

 Summary: The length of compact queue will be wrong when the 
compacting is slow
 Key: HBASE-26987
 URL: https://issues.apache.org/jira/browse/HBASE-26987
 Project: HBase
  Issue Type: Improvement
Reporter: Zheng Wang
 Attachments: image-2022-04-29-10-26-09-351.png, 
image-2022-04-29-10-26-18-323.png, image-2022-04-29-10-26-24-087.png

For some system compaction, we set the selectNow to false, so the file 
selecting will be done until the compaction running, but it brings a side 
effect, if another compacting is slow, we may put lots of compaction to queue, 
because the filesCompacting of Hstore is empty in the meantime.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-26885) The TRSP should not go on when it get a bogus server name from AM

2022-04-04 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516864#comment-17516864
 ] 

Zheng Wang commented on HBASE-26885:


Thanks for the review and push.  [~zhangduo] 

> The TRSP should not go on when it get a bogus server name from AM
> -
>
> Key: HBASE-26885
> URL: https://issues.apache.org/jira/browse/HBASE-26885
> Project: HBase
>  Issue Type: Improvement
>  Components: proc-v2
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.12
>
>
> Currently it will submit lots of unnecessary OpenRegionProcedure by retry.
> Related log looks like below, 'localhost,1,1' is the bogus server:
> {code:java}
> 2022-03-22 10:17:48,301 WARN  [PEWorker-8] 
> assignment.RegionRemoteProcedureBase: Can not add remote operation pid=17952, 
> ppid=17951, state=RUNNABLE, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region 
> {ENCODED => 490391c232c7aa13f7e0d50bfe1f7235, NAME => 
> 'TestTable1,002497747,1647568640784.490391c232c7aa13f7e0d50bfe1f7235.',
>  STARTKEY => '002497747', ENDKEY => ''} to server 
> localhost,1,1, this usually because the server is alread dead, give up and 
> mark the procedure as complete, the parent procedure will take care of this.
> org.apache.hadoop.hbase.procedure2.NoServerDispatchException: localhost,1,1; 
> pid=17952, ppid=17951, state=RUNNABLE, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
>         at 
> org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:168)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:285)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:58)
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1648)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1395)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1965)
> 2022-03-22 10:17:48,301 DEBUG [PEWorker-8] procedure2.RootProcedureState: Add 
> procedure pid=17952, ppid=17951, state=SUCCESS, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure as the 8th 
> rollback step {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26885) The TRSP should not go on when it get a bogus server name from AM

2022-03-30 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17514473#comment-17514473
 ] 

Zheng Wang commented on HBASE-26885:


Add an addendum PR. 

We should throw exception instead of return directly, avoid execute frequently.

 

> The TRSP should not go on when it get a bogus server name from AM
> -
>
> Key: HBASE-26885
> URL: https://issues.apache.org/jira/browse/HBASE-26885
> Project: HBase
>  Issue Type: Improvement
>  Components: proc-v2
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.12
>
>
> Currently it will submit lots of unnecessary OpenRegionProcedure by retry.
> Related log looks like below, 'localhost,1,1' is the bogus server:
> {code:java}
> 2022-03-22 10:17:48,301 WARN  [PEWorker-8] 
> assignment.RegionRemoteProcedureBase: Can not add remote operation pid=17952, 
> ppid=17951, state=RUNNABLE, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region 
> {ENCODED => 490391c232c7aa13f7e0d50bfe1f7235, NAME => 
> 'TestTable1,002497747,1647568640784.490391c232c7aa13f7e0d50bfe1f7235.',
>  STARTKEY => '002497747', ENDKEY => ''} to server 
> localhost,1,1, this usually because the server is alread dead, give up and 
> mark the procedure as complete, the parent procedure will take care of this.
> org.apache.hadoop.hbase.procedure2.NoServerDispatchException: localhost,1,1; 
> pid=17952, ppid=17951, state=RUNNABLE, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
>         at 
> org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:168)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:285)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:58)
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1648)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1395)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1965)
> 2022-03-22 10:17:48,301 DEBUG [PEWorker-8] procedure2.RootProcedureState: Add 
> procedure pid=17952, ppid=17951, state=SUCCESS, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure as the 8th 
> rollback step {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Reopened] (HBASE-26885) The TRSP should not go on when it get a bogus server name from AM

2022-03-29 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang reopened HBASE-26885:


> The TRSP should not go on when it get a bogus server name from AM
> -
>
> Key: HBASE-26885
> URL: https://issues.apache.org/jira/browse/HBASE-26885
> Project: HBase
>  Issue Type: Improvement
>  Components: proc-v2
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.12
>
>
> Currently it will submit lots of unnecessary OpenRegionProcedure by retry.
> Related log looks like below, 'localhost,1,1' is the bogus server:
> {code:java}
> 2022-03-22 10:17:48,301 WARN  [PEWorker-8] 
> assignment.RegionRemoteProcedureBase: Can not add remote operation pid=17952, 
> ppid=17951, state=RUNNABLE, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region 
> {ENCODED => 490391c232c7aa13f7e0d50bfe1f7235, NAME => 
> 'TestTable1,002497747,1647568640784.490391c232c7aa13f7e0d50bfe1f7235.',
>  STARTKEY => '002497747', ENDKEY => ''} to server 
> localhost,1,1, this usually because the server is alread dead, give up and 
> mark the procedure as complete, the parent procedure will take care of this.
> org.apache.hadoop.hbase.procedure2.NoServerDispatchException: localhost,1,1; 
> pid=17952, ppid=17951, state=RUNNABLE, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
>         at 
> org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:168)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:285)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:58)
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1648)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1395)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1965)
> 2022-03-22 10:17:48,301 DEBUG [PEWorker-8] procedure2.RootProcedureState: Add 
> procedure pid=17952, ppid=17951, state=SUCCESS, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure as the 8th 
> rollback step {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26885) The TRSP should not go on when it get a bogus server name from AM

2022-03-28 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17513767#comment-17513767
 ] 

Zheng Wang commented on HBASE-26885:


Pushed to 2.4+, thanks for all the comment. [~vjasani] [~zhangduo] 
[~pankajkumar] 

> The TRSP should not go on when it get a bogus server name from AM
> -
>
> Key: HBASE-26885
> URL: https://issues.apache.org/jira/browse/HBASE-26885
> Project: HBase
>  Issue Type: Improvement
>  Components: proc-v2
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.12
>
>
> Currently it will submit lots of unnecessary OpenRegionProcedure by retry.
> Related log looks like below, 'localhost,1,1' is the bogus server:
> {code:java}
> 2022-03-22 10:17:48,301 WARN  [PEWorker-8] 
> assignment.RegionRemoteProcedureBase: Can not add remote operation pid=17952, 
> ppid=17951, state=RUNNABLE, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region 
> {ENCODED => 490391c232c7aa13f7e0d50bfe1f7235, NAME => 
> 'TestTable1,002497747,1647568640784.490391c232c7aa13f7e0d50bfe1f7235.',
>  STARTKEY => '002497747', ENDKEY => ''} to server 
> localhost,1,1, this usually because the server is alread dead, give up and 
> mark the procedure as complete, the parent procedure will take care of this.
> org.apache.hadoop.hbase.procedure2.NoServerDispatchException: localhost,1,1; 
> pid=17952, ppid=17951, state=RUNNABLE, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
>         at 
> org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:168)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:285)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:58)
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1648)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1395)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1965)
> 2022-03-22 10:17:48,301 DEBUG [PEWorker-8] procedure2.RootProcedureState: Add 
> procedure pid=17952, ppid=17951, state=SUCCESS, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure as the 8th 
> rollback step {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26885) The TRSP should not go on when it get a bogus server name from AM

2022-03-28 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang resolved HBASE-26885.

Fix Version/s: 2.5.0
   2.6.0
   3.0.0-alpha-3
   2.4.12
   Resolution: Fixed

> The TRSP should not go on when it get a bogus server name from AM
> -
>
> Key: HBASE-26885
> URL: https://issues.apache.org/jira/browse/HBASE-26885
> Project: HBase
>  Issue Type: Improvement
>  Components: proc-v2
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.5.0, 2.6.0, 3.0.0-alpha-3, 2.4.12
>
>
> Currently it will submit lots of unnecessary OpenRegionProcedure by retry.
> Related log looks like below, 'localhost,1,1' is the bogus server:
> {code:java}
> 2022-03-22 10:17:48,301 WARN  [PEWorker-8] 
> assignment.RegionRemoteProcedureBase: Can not add remote operation pid=17952, 
> ppid=17951, state=RUNNABLE, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region 
> {ENCODED => 490391c232c7aa13f7e0d50bfe1f7235, NAME => 
> 'TestTable1,002497747,1647568640784.490391c232c7aa13f7e0d50bfe1f7235.',
>  STARTKEY => '002497747', ENDKEY => ''} to server 
> localhost,1,1, this usually because the server is alread dead, give up and 
> mark the procedure as complete, the parent procedure will take care of this.
> org.apache.hadoop.hbase.procedure2.NoServerDispatchException: localhost,1,1; 
> pid=17952, ppid=17951, state=RUNNABLE, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
>         at 
> org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:168)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:285)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:58)
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1648)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1395)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1965)
> 2022-03-22 10:17:48,301 DEBUG [PEWorker-8] procedure2.RootProcedureState: Add 
> procedure pid=17952, ppid=17951, state=SUCCESS, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure as the 8th 
> rollback step {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HBASE-26885) The TRSP should not go on when it get a bogus server name from AM

2022-03-28 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26885:
---
Description: 
Currently it will submit lots of unnecessary OpenRegionProcedure by retry.

Related log looks like below, 'localhost,1,1' is the bogus server:
{code:java}
2022-03-22 10:17:48,301 WARN  [PEWorker-8] 
assignment.RegionRemoteProcedureBase: Can not add remote operation pid=17952, 
ppid=17951, state=RUNNABLE, locked=true; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region 
{ENCODED => 490391c232c7aa13f7e0d50bfe1f7235, NAME => 
'TestTable1,002497747,1647568640784.490391c232c7aa13f7e0d50bfe1f7235.',
 STARTKEY => '002497747', ENDKEY => ''} to server 
localhost,1,1, this usually because the server is alread dead, give up and mark 
the procedure as complete, the parent procedure will take care of this.
org.apache.hadoop.hbase.procedure2.NoServerDispatchException: localhost,1,1; 
pid=17952, ppid=17951, state=RUNNABLE, locked=true; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
        at 
org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:168)
        at 
org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:285)
        at 
org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:58)
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1648)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1395)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1965)
2022-03-22 10:17:48,301 DEBUG [PEWorker-8] procedure2.RootProcedureState: Add 
procedure pid=17952, ppid=17951, state=SUCCESS, locked=true; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure as the 8th 
rollback step {code}

  was:
Currently it will submit lots of unnecessary OpenRegionProcedure by retry.

Related log looks like below, "localhost,1,1" is the bogus server:
{code:java}
2022-03-22 10:17:48,301 WARN  [PEWorker-8] 
assignment.RegionRemoteProcedureBase: Can not add remote operation pid=17952, 
ppid=17951, state=RUNNABLE, locked=true; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region 
{ENCODED => 490391c232c7aa13f7e0d50bfe1f7235, NAME => 
'TestTable1,002497747,1647568640784.490391c232c7aa13f7e0d50bfe1f7235.',
 STARTKEY => '002497747', ENDKEY => ''} to server 
localhost,1,1, this usually because the server is alread dead, give up and mark 
the procedure as complete, the parent procedure will take care of this.
org.apache.hadoop.hbase.procedure2.NoServerDispatchException: localhost,1,1; 
pid=17952, ppid=17951, state=RUNNABLE, locked=true; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
        at 
org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:168)
        at 
org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:285)
        at 
org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:58)
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1648)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1395)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1965)
2022-03-22 10:17:48,301 DEBUG [PEWorker-8] procedure2.RootProcedureState: Add 
procedure pid=17952, ppid=17951, state=SUCCESS, locked=true; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure as the 8th 
rollback step {code}


> The TRSP should not go on when it get a bogus server name from AM
> -
>
> Key: HBASE-26885
> URL: https://issues.apache.org/jira/browse/HBASE-26885
> Project: HBase
>  Issue Type: Improvement
>  Components: proc-v2
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> Currently it will submit lots of unnecessary OpenRegionProcedure by retry.
> Related log looks like below, 'localhost,1,1' is the bogus server:
> {code:java}
> 2022-03-22 

[jira] [Updated] (HBASE-26885) The TRSP should not go on when it get a bogus server name from AM

2022-03-28 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26885:
---
Description: 
Currently it will submit lots of unnecessary OpenRegionProcedure by retry.

Related log looks like below, "localhost,1,1" is the bogus server:
{code:java}
2022-03-22 10:17:48,301 WARN  [PEWorker-8] 
assignment.RegionRemoteProcedureBase: Can not add remote operation pid=17952, 
ppid=17951, state=RUNNABLE, locked=true; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region 
{ENCODED => 490391c232c7aa13f7e0d50bfe1f7235, NAME => 
'TestTable1,002497747,1647568640784.490391c232c7aa13f7e0d50bfe1f7235.',
 STARTKEY => '002497747', ENDKEY => ''} to server 
localhost,1,1, this usually because the server is alread dead, give up and mark 
the procedure as complete, the parent procedure will take care of this.
org.apache.hadoop.hbase.procedure2.NoServerDispatchException: localhost,1,1; 
pid=17952, ppid=17951, state=RUNNABLE, locked=true; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
        at 
org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:168)
        at 
org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:285)
        at 
org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:58)
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1648)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1395)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1965)
2022-03-22 10:17:48,301 DEBUG [PEWorker-8] procedure2.RootProcedureState: Add 
procedure pid=17952, ppid=17951, state=SUCCESS, locked=true; 
org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure as the 8th 
rollback step {code}

  was:Currently it will submit lots of unnecessary OpenRegionProcedure by retry.


> The TRSP should not go on when it get a bogus server name from AM
> -
>
> Key: HBASE-26885
> URL: https://issues.apache.org/jira/browse/HBASE-26885
> Project: HBase
>  Issue Type: Improvement
>  Components: proc-v2
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> Currently it will submit lots of unnecessary OpenRegionProcedure by retry.
> Related log looks like below, "localhost,1,1" is the bogus server:
> {code:java}
> 2022-03-22 10:17:48,301 WARN  [PEWorker-8] 
> assignment.RegionRemoteProcedureBase: Can not add remote operation pid=17952, 
> ppid=17951, state=RUNNABLE, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region 
> {ENCODED => 490391c232c7aa13f7e0d50bfe1f7235, NAME => 
> 'TestTable1,002497747,1647568640784.490391c232c7aa13f7e0d50bfe1f7235.',
>  STARTKEY => '002497747', ENDKEY => ''} to server 
> localhost,1,1, this usually because the server is alread dead, give up and 
> mark the procedure as complete, the parent procedure will take care of this.
> org.apache.hadoop.hbase.procedure2.NoServerDispatchException: localhost,1,1; 
> pid=17952, ppid=17951, state=RUNNABLE, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure
>         at 
> org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.addOperationToNode(RemoteProcedureDispatcher.java:168)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:285)
>         at 
> org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:58)
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1648)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1395)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1965)
> 2022-03-22 10:17:48,301 DEBUG [PEWorker-8] procedure2.RootProcedureState: Add 
> procedure pid=17952, ppid=17951, state=SUCCESS, locked=true; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure as the 8th 
> rollback step {code}



--
This message was sent 

[jira] [Comment Edited] (HBASE-26884) Find unavailable regions by the startcode checking on hmaster start up and reassign them

2022-03-28 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17513293#comment-17513293
 ] 

Zheng Wang edited comment on HBASE-26884 at 3/28/22, 10:32 AM:
---

Seen this in 2.0(cdh6.0.1, at about 1 years ago) and 2.2.0 rencently, not sure 
it could happen without misoperation by user.

BTW, all that cases are along with server crash. [~anoop.hbase] 

 


was (Author: filtertip):
Seen this in 2.0(cdh6.0.1, at about 1 years ago) and 2.2.0 rencently, not sure 
it could happen without misoperation by user.   [~anoop.hbase] 

 

> Find unavailable regions by the startcode checking on hmaster start up and 
> reassign them
> 
>
> Key: HBASE-26884
> URL: https://issues.apache.org/jira/browse/HBASE-26884
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> Sometimes we have seen there are regions in open or opening state, but does 
> not deployed on any rs and without procs for them, and after  checking the 
> meta table, we find these startcode are expired. 
> It is no easy to reproduce, may be caused by corner bug or user misoperation.
> My approach is add some checking on hmaster start up, if the startcode of the 
> regionLocation expired, and neither TRSP on region nor SCP on regionserver, 
> then we should reassign the region, then we can resovle it easily just by 
> restart hmaster. 
> Hbck2 maybe also useful for some of them cases, but not easily for common 
> user to use, especially the number of these regions not small and need to be 
> recovery quickly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HBASE-26884) Find unavailable regions by the startcode checking on hmaster start up and reassign them

2022-03-28 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26884:
---
Description: 
Sometimes we have seen there are regions in open or opening state, but does not 
deployed on any rs and without procs for them, and after  checking the meta 
table, we find these startcode are expired. 

It is no easy to reproduce, may be caused by corner bug or user misoperation.

My approach is add some checking on hmaster start up, if the startcode of the 
regionLocation expired, and neither TRSP on region nor SCP on regionserver, 
then we should reassign the region, then we can resovle it easily just by 
restart hmaster. 

Hbck2 maybe also useful for some of them cases, but not easily for common user 
to use, especially the number of these regions not small and need to be 
recovery quickly.

  was:
Sometimes we have seen there are regions in open or opening state, but does not 
deployed on any rs and without procs for them, and afting  checking the meta 
table, we find these startcode are expired. 

It is no easy to reproduce, may be caused by corner bug or user misoperation.

My approach is add some checking on hmaster start up, if the startcode of the 
regionLocation expired, and neither TRSP on region nor SCP on regionserver, 
then we should reassign the region, then we can resovle it easily just by 
restart hmaster. 

Hbck2 maybe also useful for some of them cases, but not easily for common user 
to use, especially the number of these regions not small and need to be 
recovery quickly.


> Find unavailable regions by the startcode checking on hmaster start up and 
> reassign them
> 
>
> Key: HBASE-26884
> URL: https://issues.apache.org/jira/browse/HBASE-26884
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> Sometimes we have seen there are regions in open or opening state, but does 
> not deployed on any rs and without procs for them, and after  checking the 
> meta table, we find these startcode are expired. 
> It is no easy to reproduce, may be caused by corner bug or user misoperation.
> My approach is add some checking on hmaster start up, if the startcode of the 
> regionLocation expired, and neither TRSP on region nor SCP on regionserver, 
> then we should reassign the region, then we can resovle it easily just by 
> restart hmaster. 
> Hbck2 maybe also useful for some of them cases, but not easily for common 
> user to use, especially the number of these regions not small and need to be 
> recovery quickly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26884) Find unavailable regions by the startcode checking on hmaster start up and reassign them

2022-03-28 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17513293#comment-17513293
 ] 

Zheng Wang commented on HBASE-26884:


Seen this in 2.0(cdh6.0.1, at about 1 years ago) and 2.2.0 rencently, not sure 
it could happen without misoperation by user.   [~anoop.hbase] 

 

> Find unavailable regions by the startcode checking on hmaster start up and 
> reassign them
> 
>
> Key: HBASE-26884
> URL: https://issues.apache.org/jira/browse/HBASE-26884
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> Sometimes we have seen there are regions in open or opening state, but does 
> not deployed on any rs and without procs for them, and afting  checking the 
> meta table, we find these startcode are expired. 
> It is no easy to reproduce, may be caused by corner bug or user misoperation.
> My approach is add some checking on hmaster start up, if the startcode of the 
> regionLocation expired, and neither TRSP on region nor SCP on regionserver, 
> then we should reassign the region, then we can resovle it easily just by 
> restart hmaster. 
> Hbck2 maybe also useful for some of them cases, but not easily for common 
> user to use, especially the number of these regions not small and need to be 
> recovery quickly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26885) The TRSP should not go on when it get a bogus server name from AM

2022-03-25 Thread Zheng Wang (Jira)
Zheng Wang created HBASE-26885:
--

 Summary: The TRSP should not go on when it get a bogus server name 
from AM
 Key: HBASE-26885
 URL: https://issues.apache.org/jira/browse/HBASE-26885
 Project: HBase
  Issue Type: Improvement
  Components: proc-v2
Reporter: Zheng Wang
Assignee: Zheng Wang


Currently it will submit lots of unnecessary OpenRegionProcedure by retry.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26884) Find unavailable regions by the startcode checking on hmaster start up and reassign them

2022-03-25 Thread Zheng Wang (Jira)
Zheng Wang created HBASE-26884:
--

 Summary: Find unavailable regions by the startcode checking on 
hmaster start up and reassign them
 Key: HBASE-26884
 URL: https://issues.apache.org/jira/browse/HBASE-26884
 Project: HBase
  Issue Type: Improvement
  Components: master
Reporter: Zheng Wang
Assignee: Zheng Wang


Sometimes we have seen there are regions in open or opening state, but does not 
deployed on any rs and without procs for them, and afting  checking the meta 
table, we find these startcode are expired. 

It is no easy to reproduce, may be caused by corner bug or user misoperation.

My approach is add some checking on hmaster start up, if the startcode of the 
regionLocation expired, and neither TRSP on region nor SCP on regionserver, 
then we should reassign the region, then we can resovle it easily just by 
restart hmaster. 

Hbck2 maybe also useful for some of them cases, but not easily for common user 
to use, especially the number of these regions not small and need to be 
recovery quickly.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException

2021-12-09 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang resolved HBASE-26027.

Resolution: Fixed

> The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone 
> caused by ArrayStoreException
> -
>
> Key: HBASE-26027
> URL: https://issues.apache.org/jira/browse/HBASE-26027
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.2.7, 2.5.0, 2.3.5, 2.4.4
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.6.0
>
>
> The batch api of HTable contains a param named results to store result or 
> exception, its type is Object[].
> If user pass an array with other type, eg: 
> org.apache.hadoop.hbase.client.Result, and if we need to put an exception 
> into it by some reason, then the ArrayStoreException will occur in 
> AsyncRequestFutureImpl.updateResult, then the 
> AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
> AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
> actionsInProgress again and again, forever.
> It is better to add an cutoff calculated by operationTimeout, instead of only 
> depend on the value of actionsInProgress.
> BTW, this issue only for 2.x, since 3.x the implement has refactored.
> How to reproduce:
> 1: add sleep in RSRpcServices.multi to mock slow response
> {code:java}
> try {
>  Thread.sleep(2000);
>  } catch (InterruptedException e) {
>  e.printStackTrace();
>  }
> {code}
> 2: set time out in config
> {code:java}
> conf.set("hbase.rpc.timeout","2000");
> conf.set("hbase.client.operation.timeout","6000");
> {code}
> 3: call batch api
> {code:java}
> Table table = HbaseUtil.getTable("test");
>  byte[] cf = Bytes.toBytes("f");
>  byte[] c = Bytes.toBytes("c1");
>  List gets = new ArrayList<>();
>  for (int i = 0; i < 10; i++) {
>  byte[] rk = Bytes.toBytes("rk-" + i);
>  Get get = new Get(rk);
>  get.addColumn(cf, c);
>  gets.add(get);
>  }
>  Result[] results = new Result[gets.size()];
>  table.batch(gets, results);
> {code}
> The log will looks like below:
> {code:java}
> [ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - 
> id=1 error for test processing localhost,16020,1624343786295
> java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> [INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10  actions to 
> finish on table: 
> [INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:00,400] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:10,408] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:20,413] main - #1, waiting for 10  actions to 
> finish on table: test
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException

2021-12-09 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17456906#comment-17456906
 ] 

Zheng Wang commented on HBASE-26027:


Merged to branch-2, thanks for the review. [~apurtell] 

 

> The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone 
> caused by ArrayStoreException
> -
>
> Key: HBASE-26027
> URL: https://issues.apache.org/jira/browse/HBASE-26027
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.2.7, 2.5.0, 2.3.5, 2.4.4
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.6.0
>
>
> The batch api of HTable contains a param named results to store result or 
> exception, its type is Object[].
> If user pass an array with other type, eg: 
> org.apache.hadoop.hbase.client.Result, and if we need to put an exception 
> into it by some reason, then the ArrayStoreException will occur in 
> AsyncRequestFutureImpl.updateResult, then the 
> AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
> AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
> actionsInProgress again and again, forever.
> It is better to add an cutoff calculated by operationTimeout, instead of only 
> depend on the value of actionsInProgress.
> BTW, this issue only for 2.x, since 3.x the implement has refactored.
> How to reproduce:
> 1: add sleep in RSRpcServices.multi to mock slow response
> {code:java}
> try {
>  Thread.sleep(2000);
>  } catch (InterruptedException e) {
>  e.printStackTrace();
>  }
> {code}
> 2: set time out in config
> {code:java}
> conf.set("hbase.rpc.timeout","2000");
> conf.set("hbase.client.operation.timeout","6000");
> {code}
> 3: call batch api
> {code:java}
> Table table = HbaseUtil.getTable("test");
>  byte[] cf = Bytes.toBytes("f");
>  byte[] c = Bytes.toBytes("c1");
>  List gets = new ArrayList<>();
>  for (int i = 0; i < 10; i++) {
>  byte[] rk = Bytes.toBytes("rk-" + i);
>  Get get = new Get(rk);
>  get.addColumn(cf, c);
>  gets.add(get);
>  }
>  Result[] results = new Result[gets.size()];
>  table.batch(gets, results);
> {code}
> The log will looks like below:
> {code:java}
> [ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - 
> id=1 error for test processing localhost,16020,1624343786295
> java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> [INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10  actions to 
> finish on table: 
> [INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:00,400] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:10,408] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:20,413] main - #1, waiting for 10  actions to 
> finish on table: test
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException

2021-12-08 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26027:
---
Fix Version/s: 2.6.0

> The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone 
> caused by ArrayStoreException
> -
>
> Key: HBASE-26027
> URL: https://issues.apache.org/jira/browse/HBASE-26027
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.2.7, 2.5.0, 2.3.5, 2.4.4
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.6.0
>
>
> The batch api of HTable contains a param named results to store result or 
> exception, its type is Object[].
> If user pass an array with other type, eg: 
> org.apache.hadoop.hbase.client.Result, and if we need to put an exception 
> into it by some reason, then the ArrayStoreException will occur in 
> AsyncRequestFutureImpl.updateResult, then the 
> AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
> AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
> actionsInProgress again and again, forever.
> It is better to add an cutoff calculated by operationTimeout, instead of only 
> depend on the value of actionsInProgress.
> BTW, this issue only for 2.x, since 3.x the implement has refactored.
> How to reproduce:
> 1: add sleep in RSRpcServices.multi to mock slow response
> {code:java}
> try {
>  Thread.sleep(2000);
>  } catch (InterruptedException e) {
>  e.printStackTrace();
>  }
> {code}
> 2: set time out in config
> {code:java}
> conf.set("hbase.rpc.timeout","2000");
> conf.set("hbase.client.operation.timeout","6000");
> {code}
> 3: call batch api
> {code:java}
> Table table = HbaseUtil.getTable("test");
>  byte[] cf = Bytes.toBytes("f");
>  byte[] c = Bytes.toBytes("c1");
>  List gets = new ArrayList<>();
>  for (int i = 0; i < 10; i++) {
>  byte[] rk = Bytes.toBytes("rk-" + i);
>  Get get = new Get(rk);
>  get.addColumn(cf, c);
>  gets.add(get);
>  }
>  Result[] results = new Result[gets.size()];
>  table.batch(gets, results);
> {code}
> The log will looks like below:
> {code:java}
> [ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - 
> id=1 error for test processing localhost,16020,1624343786295
> java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> [INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10  actions to 
> finish on table: 
> [INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:00,400] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:10,408] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:20,413] main - #1, waiting for 10  actions to 
> finish on table: test
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException

2021-12-08 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17456137#comment-17456137
 ] 

Zheng Wang edited comment on HBASE-26027 at 12/9/21, 4:33 AM:
--

Pushed a new PR #3925, just for branch-2.

[~apurtell] 


was (Author: filtertip):
Pushed a new PR #3925, just for branch-2.

And not plan to push 2.3.x and 2.4.x since they are already stable.

[~apurtell] 

> The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone 
> caused by ArrayStoreException
> -
>
> Key: HBASE-26027
> URL: https://issues.apache.org/jira/browse/HBASE-26027
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.2.7, 2.5.0, 2.3.5, 2.4.4
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> The batch api of HTable contains a param named results to store result or 
> exception, its type is Object[].
> If user pass an array with other type, eg: 
> org.apache.hadoop.hbase.client.Result, and if we need to put an exception 
> into it by some reason, then the ArrayStoreException will occur in 
> AsyncRequestFutureImpl.updateResult, then the 
> AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
> AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
> actionsInProgress again and again, forever.
> It is better to add an cutoff calculated by operationTimeout, instead of only 
> depend on the value of actionsInProgress.
> BTW, this issue only for 2.x, since 3.x the implement has refactored.
> How to reproduce:
> 1: add sleep in RSRpcServices.multi to mock slow response
> {code:java}
> try {
>  Thread.sleep(2000);
>  } catch (InterruptedException e) {
>  e.printStackTrace();
>  }
> {code}
> 2: set time out in config
> {code:java}
> conf.set("hbase.rpc.timeout","2000");
> conf.set("hbase.client.operation.timeout","6000");
> {code}
> 3: call batch api
> {code:java}
> Table table = HbaseUtil.getTable("test");
>  byte[] cf = Bytes.toBytes("f");
>  byte[] c = Bytes.toBytes("c1");
>  List gets = new ArrayList<>();
>  for (int i = 0; i < 10; i++) {
>  byte[] rk = Bytes.toBytes("rk-" + i);
>  Get get = new Get(rk);
>  get.addColumn(cf, c);
>  gets.add(get);
>  }
>  Result[] results = new Result[gets.size()];
>  table.batch(gets, results);
> {code}
> The log will looks like below:
> {code:java}
> [ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - 
> id=1 error for test processing localhost,16020,1624343786295
> java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> [INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10  actions to 
> finish on table: 
> [INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:00,400] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:10,408] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:20,413] main - #1, waiting for 10  actions to 
> finish on table: test
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException

2021-12-08 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17456137#comment-17456137
 ] 

Zheng Wang edited comment on HBASE-26027 at 12/9/21, 4:29 AM:
--

Pushed a new PR #3925, just for branch-2.

And not plan to push 2.3.x and 2.4.x since they are already stable.

[~apurtell] 


was (Author: filtertip):
Pushed a new PR #3925, just for branch-2.

And not plan to push 2.3.x and 2.4.x since they are already stable.

 

> The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone 
> caused by ArrayStoreException
> -
>
> Key: HBASE-26027
> URL: https://issues.apache.org/jira/browse/HBASE-26027
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.2.7, 2.5.0, 2.3.5, 2.4.4
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> The batch api of HTable contains a param named results to store result or 
> exception, its type is Object[].
> If user pass an array with other type, eg: 
> org.apache.hadoop.hbase.client.Result, and if we need to put an exception 
> into it by some reason, then the ArrayStoreException will occur in 
> AsyncRequestFutureImpl.updateResult, then the 
> AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
> AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
> actionsInProgress again and again, forever.
> It is better to add an cutoff calculated by operationTimeout, instead of only 
> depend on the value of actionsInProgress.
> BTW, this issue only for 2.x, since 3.x the implement has refactored.
> How to reproduce:
> 1: add sleep in RSRpcServices.multi to mock slow response
> {code:java}
> try {
>  Thread.sleep(2000);
>  } catch (InterruptedException e) {
>  e.printStackTrace();
>  }
> {code}
> 2: set time out in config
> {code:java}
> conf.set("hbase.rpc.timeout","2000");
> conf.set("hbase.client.operation.timeout","6000");
> {code}
> 3: call batch api
> {code:java}
> Table table = HbaseUtil.getTable("test");
>  byte[] cf = Bytes.toBytes("f");
>  byte[] c = Bytes.toBytes("c1");
>  List gets = new ArrayList<>();
>  for (int i = 0; i < 10; i++) {
>  byte[] rk = Bytes.toBytes("rk-" + i);
>  Get get = new Get(rk);
>  get.addColumn(cf, c);
>  gets.add(get);
>  }
>  Result[] results = new Result[gets.size()];
>  table.batch(gets, results);
> {code}
> The log will looks like below:
> {code:java}
> [ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - 
> id=1 error for test processing localhost,16020,1624343786295
> java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> [INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10  actions to 
> finish on table: 
> [INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:00,400] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:10,408] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:20,413] main - #1, waiting for 10  actions to 
> finish on table: test
> {code}



--
This message 

[jira] [Commented] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException

2021-12-08 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17456137#comment-17456137
 ] 

Zheng Wang commented on HBASE-26027:


Pushed a new PR #3925, just for branch-2.

And not plan to push 2.3.x and 2.4.x since they are already stable.

 

> The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone 
> caused by ArrayStoreException
> -
>
> Key: HBASE-26027
> URL: https://issues.apache.org/jira/browse/HBASE-26027
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.2.7, 2.5.0, 2.3.5, 2.4.4
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> The batch api of HTable contains a param named results to store result or 
> exception, its type is Object[].
> If user pass an array with other type, eg: 
> org.apache.hadoop.hbase.client.Result, and if we need to put an exception 
> into it by some reason, then the ArrayStoreException will occur in 
> AsyncRequestFutureImpl.updateResult, then the 
> AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
> AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
> actionsInProgress again and again, forever.
> It is better to add an cutoff calculated by operationTimeout, instead of only 
> depend on the value of actionsInProgress.
> BTW, this issue only for 2.x, since 3.x the implement has refactored.
> How to reproduce:
> 1: add sleep in RSRpcServices.multi to mock slow response
> {code:java}
> try {
>  Thread.sleep(2000);
>  } catch (InterruptedException e) {
>  e.printStackTrace();
>  }
> {code}
> 2: set time out in config
> {code:java}
> conf.set("hbase.rpc.timeout","2000");
> conf.set("hbase.client.operation.timeout","6000");
> {code}
> 3: call batch api
> {code:java}
> Table table = HbaseUtil.getTable("test");
>  byte[] cf = Bytes.toBytes("f");
>  byte[] c = Bytes.toBytes("c1");
>  List gets = new ArrayList<>();
>  for (int i = 0; i < 10; i++) {
>  byte[] rk = Bytes.toBytes("rk-" + i);
>  Get get = new Get(rk);
>  get.addColumn(cf, c);
>  gets.add(get);
>  }
>  Result[] results = new Result[gets.size()];
>  table.batch(gets, results);
> {code}
> The log will looks like below:
> {code:java}
> [ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - 
> id=1 error for test processing localhost,16020,1624343786295
> java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> [INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10  actions to 
> finish on table: 
> [INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:00,400] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:10,408] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:20,413] main - #1, waiting for 10  actions to 
> finish on table: test
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException

2021-12-07 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-26027 started by Zheng Wang.
--
> The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone 
> caused by ArrayStoreException
> -
>
> Key: HBASE-26027
> URL: https://issues.apache.org/jira/browse/HBASE-26027
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.2.7, 2.3.5, 2.4.4
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.5.0, 2.3.8, 2.4.9
>
>
> The batch api of HTable contains a param named results to store result or 
> exception, its type is Object[].
> If user pass an array with other type, eg: 
> org.apache.hadoop.hbase.client.Result, and if we need to put an exception 
> into it by some reason, then the ArrayStoreException will occur in 
> AsyncRequestFutureImpl.updateResult, then the 
> AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
> AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
> actionsInProgress again and again, forever.
> It is better to add an cutoff calculated by operationTimeout, instead of only 
> depend on the value of actionsInProgress.
> BTW, this issue only for 2.x, since 3.x the implement has refactored.
> How to reproduce:
> 1: add sleep in RSRpcServices.multi to mock slow response
> {code:java}
> try {
>  Thread.sleep(2000);
>  } catch (InterruptedException e) {
>  e.printStackTrace();
>  }
> {code}
> 2: set time out in config
> {code:java}
> conf.set("hbase.rpc.timeout","2000");
> conf.set("hbase.client.operation.timeout","6000");
> {code}
> 3: call batch api
> {code:java}
> Table table = HbaseUtil.getTable("test");
>  byte[] cf = Bytes.toBytes("f");
>  byte[] c = Bytes.toBytes("c1");
>  List gets = new ArrayList<>();
>  for (int i = 0; i < 10; i++) {
>  byte[] rk = Bytes.toBytes("rk-" + i);
>  Get get = new Get(rk);
>  get.addColumn(cf, c);
>  gets.add(get);
>  }
>  Result[] results = new Result[gets.size()];
>  table.batch(gets, results);
> {code}
> The log will looks like below:
> {code:java}
> [ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - 
> id=1 error for test processing localhost,16020,1624343786295
> java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> [INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10  actions to 
> finish on table: 
> [INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:00,400] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:10,408] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:20,413] main - #1, waiting for 10  actions to 
> finish on table: test
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException

2021-07-21 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384823#comment-17384823
 ] 

Zheng Wang commented on HBASE-26027:


Will dig into it later, thanks for the finding. [~apurtell]

> The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone 
> caused by ArrayStoreException
> -
>
> Key: HBASE-26027
> URL: https://issues.apache.org/jira/browse/HBASE-26027
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.2.7, 2.3.5, 2.4.4
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.5.0, 2.3.6, 2.4.6
>
>
> The batch api of HTable contains a param named results to store result or 
> exception, its type is Object[].
> If user pass an array with other type, eg: 
> org.apache.hadoop.hbase.client.Result, and if we need to put an exception 
> into it by some reason, then the ArrayStoreException will occur in 
> AsyncRequestFutureImpl.updateResult, then the 
> AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
> AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
> actionsInProgress again and again, forever.
> It is better to add an cutoff calculated by operationTimeout, instead of only 
> depend on the value of actionsInProgress.
> BTW, this issue only for 2.x, since 3.x the implement has refactored.
> How to reproduce:
> 1: add sleep in RSRpcServices.multi to mock slow response
> {code:java}
> try {
>  Thread.sleep(2000);
>  } catch (InterruptedException e) {
>  e.printStackTrace();
>  }
> {code}
> 2: set time out in config
> {code:java}
> conf.set("hbase.rpc.timeout","2000");
> conf.set("hbase.client.operation.timeout","6000");
> {code}
> 3: call batch api
> {code:java}
> Table table = HbaseUtil.getTable("test");
>  byte[] cf = Bytes.toBytes("f");
>  byte[] c = Bytes.toBytes("c1");
>  List gets = new ArrayList<>();
>  for (int i = 0; i < 10; i++) {
>  byte[] rk = Bytes.toBytes("rk-" + i);
>  Get get = new Get(rk);
>  get.addColumn(cf, c);
>  gets.add(get);
>  }
>  Result[] results = new Result[gets.size()];
>  table.batch(gets, results);
> {code}
> The log will looks like below:
> {code:java}
> [ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - 
> id=1 error for test processing localhost,16020,1624343786295
> java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> [INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10  actions to 
> finish on table: 
> [INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:00,400] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:10,408] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:20,413] main - #1, waiting for 10  actions to 
> finish on table: test
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26036) DBB released too early in HRegion.get() and dirty data for some operations

2021-07-01 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373233#comment-17373233
 ] 

Zheng Wang commented on HBASE-26036:


[~Xiaolin Ha]. Ok, get it.

> DBB released too early in HRegion.get() and dirty data for some operations
> --
>
> Key: HBASE-26036
> URL: https://issues.apache.org/jira/browse/HBASE-26036
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 3.0.0-alpha-1, 2.0.0
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Critical
>
> Before HBASE-25187, we found there are regionserver JVM crashing problems on 
> our production clusters, the coredump infos are as follows,
> {code:java}
> Stack: [0x7f621ba8d000,0x7f621bb8e000],  sp=0x7f621bb8c0e0,  free 
> space=1020k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> J 10829 C2 org.apache.hadoop.hbase.ByteBufferKeyValue.getTimestamp()J (9 
> bytes) @ 0x7f6a5ee11b2d [0x7f6a5ee11ae0+0x4d]
> J 22844 C2 
> org.apache.hadoop.hbase.regionserver.HRegion.doCheckAndRowMutate([B[B[BLorg/apache/hadoop/hbase/filter/CompareFilter$CompareOp;Lorg/apache/hadoop/hbase/filter/ByteArrayComparable;Lorg/apache/hadoop/hbase/client/RowMutations;Lorg/apache/hadoop/hbase/client/Mutation;Z)Z
>  (540 bytes) @ 0x7f6a60bed144 [0x7f6a60beb320+0x1e24]
> J 17972 C2 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkAndRowMutate(Lorg/apache/hadoop/hbase/regionserver/Region;Ljava/util/List;Lorg/apache/hadoop/hbase/CellScanner;[B[B[BLorg/apache/hadoop/hbase/filter/CompareFilter$CompareOp;Lorg/apache/hadoop/hbase/filter/ByteArrayComparable;Lorg/apache/hadoop/hbase/shaded/protobuf/generated/ClientProtos$RegionActionResult$Builder;)Z
>  (312 bytes) @ 0x7f6a5f4a7ed0 [0x7f6a5f4a6f40+0xf90]
> J 26197 C2 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(Lorg/apache/hbase/thirdparty/com/google/protobuf/RpcController;Lorg/apache/hadoop/hbase/shaded/protobuf/generated/ClientProtos$MultiRequest;)Lorg/apache/hadoop/hbase/shaded/protobuf/generated/ClientProtos$MultiResponse;
>  (644 bytes) @ 0x7f6a61538b0c [0x7f6a61537940+0x11cc]
> J 26332 C2 
> org.apache.hadoop.hbase.ipc.RpcServer.call(Lorg/apache/hadoop/hbase/ipc/RpcCall;Lorg/apache/hadoop/hbase/monitoring/MonitoredRPCHandler;)Lorg/apache/hadoop/hbase/util/Pair;
>  (566 bytes) @ 0x7f6a615e8228 [0x7f6a615e79c0+0x868]
> J 20563 C2 org.apache.hadoop.hbase.ipc.CallRunner.run()V (1196 bytes) @ 
> 0x7f6a60711a4c [0x7f6a60711000+0xa4c]
> J 19656% C2 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(Ljava/util/concurrent/BlockingQueue;Ljava/util/concurrent/atomic/AtomicInteger;)V
>  (338 bytes) @ 0x7f6a6039a414 [0x7f6a6039a320+0xf4]
> j  org.apache.hadoop.hbase.ipc.RpcExecutor$1.run()V+24
> j  java.lang.Thread.run()V+11
> v  ~StubRoutines::call_stub
> {code}
> I have made a UT to reproduce this error, it can occur 100%。
> After HBASE-25187,the check result of the checkAndMutate will be false, 
> because it read wrong/dirty data from the released ByteBuff.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-26036) DBB released too early in HRegion.get() and dirty data for some operations

2021-07-01 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373145#comment-17373145
 ] 

Zheng Wang edited comment on HBASE-26036 at 7/2/21, 2:01 AM:
-

HBASE-25187 seems not related to this issue, should be HBASE-25981?  [~Xiaolin 
Ha]


was (Author: filtertip):
HBASE-25187 seems not related to this issue, should be HBASE-25981?

> DBB released too early in HRegion.get() and dirty data for some operations
> --
>
> Key: HBASE-26036
> URL: https://issues.apache.org/jira/browse/HBASE-26036
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 3.0.0-alpha-1, 2.0.0
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Critical
>
> Before HBASE-25187, we found there are regionserver JVM crashing problems on 
> our production clusters, the coredump infos are as follows,
> {code:java}
> Stack: [0x7f621ba8d000,0x7f621bb8e000],  sp=0x7f621bb8c0e0,  free 
> space=1020k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> J 10829 C2 org.apache.hadoop.hbase.ByteBufferKeyValue.getTimestamp()J (9 
> bytes) @ 0x7f6a5ee11b2d [0x7f6a5ee11ae0+0x4d]
> J 22844 C2 
> org.apache.hadoop.hbase.regionserver.HRegion.doCheckAndRowMutate([B[B[BLorg/apache/hadoop/hbase/filter/CompareFilter$CompareOp;Lorg/apache/hadoop/hbase/filter/ByteArrayComparable;Lorg/apache/hadoop/hbase/client/RowMutations;Lorg/apache/hadoop/hbase/client/Mutation;Z)Z
>  (540 bytes) @ 0x7f6a60bed144 [0x7f6a60beb320+0x1e24]
> J 17972 C2 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkAndRowMutate(Lorg/apache/hadoop/hbase/regionserver/Region;Ljava/util/List;Lorg/apache/hadoop/hbase/CellScanner;[B[B[BLorg/apache/hadoop/hbase/filter/CompareFilter$CompareOp;Lorg/apache/hadoop/hbase/filter/ByteArrayComparable;Lorg/apache/hadoop/hbase/shaded/protobuf/generated/ClientProtos$RegionActionResult$Builder;)Z
>  (312 bytes) @ 0x7f6a5f4a7ed0 [0x7f6a5f4a6f40+0xf90]
> J 26197 C2 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(Lorg/apache/hbase/thirdparty/com/google/protobuf/RpcController;Lorg/apache/hadoop/hbase/shaded/protobuf/generated/ClientProtos$MultiRequest;)Lorg/apache/hadoop/hbase/shaded/protobuf/generated/ClientProtos$MultiResponse;
>  (644 bytes) @ 0x7f6a61538b0c [0x7f6a61537940+0x11cc]
> J 26332 C2 
> org.apache.hadoop.hbase.ipc.RpcServer.call(Lorg/apache/hadoop/hbase/ipc/RpcCall;Lorg/apache/hadoop/hbase/monitoring/MonitoredRPCHandler;)Lorg/apache/hadoop/hbase/util/Pair;
>  (566 bytes) @ 0x7f6a615e8228 [0x7f6a615e79c0+0x868]
> J 20563 C2 org.apache.hadoop.hbase.ipc.CallRunner.run()V (1196 bytes) @ 
> 0x7f6a60711a4c [0x7f6a60711000+0xa4c]
> J 19656% C2 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(Ljava/util/concurrent/BlockingQueue;Ljava/util/concurrent/atomic/AtomicInteger;)V
>  (338 bytes) @ 0x7f6a6039a414 [0x7f6a6039a320+0xf4]
> j  org.apache.hadoop.hbase.ipc.RpcExecutor$1.run()V+24
> j  java.lang.Thread.run()V+11
> v  ~StubRoutines::call_stub
> {code}
> I have made a UT to reproduce this error, it can occur 100%。
> After HBASE-25187,the check result of the checkAndMutate will be false, 
> because it read wrong/dirty data from the released ByteBuff.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26036) DBB released too early in HRegion.get() and dirty data for some operations

2021-07-01 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373145#comment-17373145
 ] 

Zheng Wang commented on HBASE-26036:


HBASE-25187 seems not related to this issue, should be HBASE-25981?

> DBB released too early in HRegion.get() and dirty data for some operations
> --
>
> Key: HBASE-26036
> URL: https://issues.apache.org/jira/browse/HBASE-26036
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 3.0.0-alpha-1, 2.0.0
>Reporter: Xiaolin Ha
>Assignee: Xiaolin Ha
>Priority: Critical
>
> Before HBASE-25187, we found there are regionserver JVM crashing problems on 
> our production clusters, the coredump infos are as follows,
> {code:java}
> Stack: [0x7f621ba8d000,0x7f621bb8e000],  sp=0x7f621bb8c0e0,  free 
> space=1020k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> J 10829 C2 org.apache.hadoop.hbase.ByteBufferKeyValue.getTimestamp()J (9 
> bytes) @ 0x7f6a5ee11b2d [0x7f6a5ee11ae0+0x4d]
> J 22844 C2 
> org.apache.hadoop.hbase.regionserver.HRegion.doCheckAndRowMutate([B[B[BLorg/apache/hadoop/hbase/filter/CompareFilter$CompareOp;Lorg/apache/hadoop/hbase/filter/ByteArrayComparable;Lorg/apache/hadoop/hbase/client/RowMutations;Lorg/apache/hadoop/hbase/client/Mutation;Z)Z
>  (540 bytes) @ 0x7f6a60bed144 [0x7f6a60beb320+0x1e24]
> J 17972 C2 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.checkAndRowMutate(Lorg/apache/hadoop/hbase/regionserver/Region;Ljava/util/List;Lorg/apache/hadoop/hbase/CellScanner;[B[B[BLorg/apache/hadoop/hbase/filter/CompareFilter$CompareOp;Lorg/apache/hadoop/hbase/filter/ByteArrayComparable;Lorg/apache/hadoop/hbase/shaded/protobuf/generated/ClientProtos$RegionActionResult$Builder;)Z
>  (312 bytes) @ 0x7f6a5f4a7ed0 [0x7f6a5f4a6f40+0xf90]
> J 26197 C2 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(Lorg/apache/hbase/thirdparty/com/google/protobuf/RpcController;Lorg/apache/hadoop/hbase/shaded/protobuf/generated/ClientProtos$MultiRequest;)Lorg/apache/hadoop/hbase/shaded/protobuf/generated/ClientProtos$MultiResponse;
>  (644 bytes) @ 0x7f6a61538b0c [0x7f6a61537940+0x11cc]
> J 26332 C2 
> org.apache.hadoop.hbase.ipc.RpcServer.call(Lorg/apache/hadoop/hbase/ipc/RpcCall;Lorg/apache/hadoop/hbase/monitoring/MonitoredRPCHandler;)Lorg/apache/hadoop/hbase/util/Pair;
>  (566 bytes) @ 0x7f6a615e8228 [0x7f6a615e79c0+0x868]
> J 20563 C2 org.apache.hadoop.hbase.ipc.CallRunner.run()V (1196 bytes) @ 
> 0x7f6a60711a4c [0x7f6a60711000+0xa4c]
> J 19656% C2 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(Ljava/util/concurrent/BlockingQueue;Ljava/util/concurrent/atomic/AtomicInteger;)V
>  (338 bytes) @ 0x7f6a6039a414 [0x7f6a6039a320+0xf4]
> j  org.apache.hadoop.hbase.ipc.RpcExecutor$1.run()V+24
> j  java.lang.Thread.run()V+11
> v  ~StubRoutines::call_stub
> {code}
> I have made a UT to reproduce this error, it can occur 100%。
> After HBASE-25187,the check result of the checkAndMutate will be false, 
> because it read wrong/dirty data from the released ByteBuff.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException

2021-07-01 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372663#comment-17372663
 ] 

Zheng Wang commented on HBASE-26027:


Thanks for all the comments. [~reidchan] [~zhangduo]

> The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone 
> caused by ArrayStoreException
> -
>
> Key: HBASE-26027
> URL: https://issues.apache.org/jira/browse/HBASE-26027
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.2.7, 2.3.5, 2.4.4
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.5.0, 2.3.6, 2.4.5
>
>
> The batch api of HTable contains a param named results to store result or 
> exception, its type is Object[].
> If user pass an array with other type, eg: 
> org.apache.hadoop.hbase.client.Result, and if we need to put an exception 
> into it by some reason, then the ArrayStoreException will occur in 
> AsyncRequestFutureImpl.updateResult, then the 
> AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
> AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
> actionsInProgress again and again, forever.
> It is better to add an cutoff calculated by operationTimeout, instead of only 
> depend on the value of actionsInProgress.
> BTW, this issue only for 2.x, since 3.x the implement has refactored.
> How to reproduce:
> 1: add sleep in RSRpcServices.multi to mock slow response
> {code:java}
> try {
>  Thread.sleep(2000);
>  } catch (InterruptedException e) {
>  e.printStackTrace();
>  }
> {code}
> 2: set time out in config
> {code:java}
> conf.set("hbase.rpc.timeout","2000");
> conf.set("hbase.client.operation.timeout","6000");
> {code}
> 3: call batch api
> {code:java}
> Table table = HbaseUtil.getTable("test");
>  byte[] cf = Bytes.toBytes("f");
>  byte[] c = Bytes.toBytes("c1");
>  List gets = new ArrayList<>();
>  for (int i = 0; i < 10; i++) {
>  byte[] rk = Bytes.toBytes("rk-" + i);
>  Get get = new Get(rk);
>  get.addColumn(cf, c);
>  gets.add(get);
>  }
>  Result[] results = new Result[gets.size()];
>  table.batch(gets, results);
> {code}
> The log will looks like below:
> {code:java}
> [ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - 
> id=1 error for test processing localhost,16020,1624343786295
> java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> [INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10  actions to 
> finish on table: 
> [INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:00,400] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:10,408] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:20,413] main - #1, waiting for 10  actions to 
> finish on table: test
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException

2021-07-01 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang resolved HBASE-26027.

Resolution: Fixed

> The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone 
> caused by ArrayStoreException
> -
>
> Key: HBASE-26027
> URL: https://issues.apache.org/jira/browse/HBASE-26027
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.2.7, 2.3.5, 2.4.4
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.5.0, 2.3.6, 2.4.5
>
>
> The batch api of HTable contains a param named results to store result or 
> exception, its type is Object[].
> If user pass an array with other type, eg: 
> org.apache.hadoop.hbase.client.Result, and if we need to put an exception 
> into it by some reason, then the ArrayStoreException will occur in 
> AsyncRequestFutureImpl.updateResult, then the 
> AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
> AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
> actionsInProgress again and again, forever.
> It is better to add an cutoff calculated by operationTimeout, instead of only 
> depend on the value of actionsInProgress.
> BTW, this issue only for 2.x, since 3.x the implement has refactored.
> How to reproduce:
> 1: add sleep in RSRpcServices.multi to mock slow response
> {code:java}
> try {
>  Thread.sleep(2000);
>  } catch (InterruptedException e) {
>  e.printStackTrace();
>  }
> {code}
> 2: set time out in config
> {code:java}
> conf.set("hbase.rpc.timeout","2000");
> conf.set("hbase.client.operation.timeout","6000");
> {code}
> 3: call batch api
> {code:java}
> Table table = HbaseUtil.getTable("test");
>  byte[] cf = Bytes.toBytes("f");
>  byte[] c = Bytes.toBytes("c1");
>  List gets = new ArrayList<>();
>  for (int i = 0; i < 10; i++) {
>  byte[] rk = Bytes.toBytes("rk-" + i);
>  Get get = new Get(rk);
>  get.addColumn(cf, c);
>  gets.add(get);
>  }
>  Result[] results = new Result[gets.size()];
>  table.batch(gets, results);
> {code}
> The log will looks like below:
> {code:java}
> [ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - 
> id=1 error for test processing localhost,16020,1624343786295
> java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> [INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10  actions to 
> finish on table: 
> [INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:00,400] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:10,408] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:20,413] main - #1, waiting for 10  actions to 
> finish on table: test
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException

2021-07-01 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26027:
---
Fix Version/s: 2.4.5
   2.3.6
   2.5.0

> The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone 
> caused by ArrayStoreException
> -
>
> Key: HBASE-26027
> URL: https://issues.apache.org/jira/browse/HBASE-26027
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.2.7, 2.3.5, 2.4.4
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.5.0, 2.3.6, 2.4.5
>
>
> The batch api of HTable contains a param named results to store result or 
> exception, its type is Object[].
> If user pass an array with other type, eg: 
> org.apache.hadoop.hbase.client.Result, and if we need to put an exception 
> into it by some reason, then the ArrayStoreException will occur in 
> AsyncRequestFutureImpl.updateResult, then the 
> AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
> AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
> actionsInProgress again and again, forever.
> It is better to add an cutoff calculated by operationTimeout, instead of only 
> depend on the value of actionsInProgress.
> BTW, this issue only for 2.x, since 3.x the implement has refactored.
> How to reproduce:
> 1: add sleep in RSRpcServices.multi to mock slow response
> {code:java}
> try {
>  Thread.sleep(2000);
>  } catch (InterruptedException e) {
>  e.printStackTrace();
>  }
> {code}
> 2: set time out in config
> {code:java}
> conf.set("hbase.rpc.timeout","2000");
> conf.set("hbase.client.operation.timeout","6000");
> {code}
> 3: call batch api
> {code:java}
> Table table = HbaseUtil.getTable("test");
>  byte[] cf = Bytes.toBytes("f");
>  byte[] c = Bytes.toBytes("c1");
>  List gets = new ArrayList<>();
>  for (int i = 0; i < 10; i++) {
>  byte[] rk = Bytes.toBytes("rk-" + i);
>  Get get = new Get(rk);
>  get.addColumn(cf, c);
>  gets.add(get);
>  }
>  Result[] results = new Result[gets.size()];
>  table.batch(gets, results);
> {code}
> The log will looks like below:
> {code:java}
> [ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - 
> id=1 error for test processing localhost,16020,1624343786295
> java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> [INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10  actions to 
> finish on table: 
> [INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:00,400] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:10,408] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:20,413] main - #1, waiting for 10  actions to 
> finish on table: test
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-26028) The view as json page shows exception when using TinyLfuBlockCache

2021-06-30 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26028:
---
Fix Version/s: 2.4.5
   2.3.6
   2.5.0

> The view as json page shows exception when using TinyLfuBlockCache
> --
>
> Key: HBASE-26028
> URL: https://issues.apache.org/jira/browse/HBASE-26028
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 2.5.0, 2.3.6, 3.0.0-alpha-2, 2.4.5
>
> Attachments: HBASE-26028-afterpatch.jpg, HBASE-26028-beforepatch.jpg
>
>
> Some variable in TinyLfuBlockCache should be marked as transient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-26028) The view as json page shows exception when using TinyLfuBlockCache

2021-06-30 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26028:
---
Fix Version/s: 3.0.0-alpha-2

> The view as json page shows exception when using TinyLfuBlockCache
> --
>
> Key: HBASE-26028
> URL: https://issues.apache.org/jira/browse/HBASE-26028
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 3.0.0-alpha-2
>
> Attachments: HBASE-26028-afterpatch.jpg, HBASE-26028-beforepatch.jpg
>
>
> Some variable in TinyLfuBlockCache should be marked as transient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-26028) The view as json page shows exception when using TinyLfuBlockCache

2021-06-30 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang resolved HBASE-26028.

Resolution: Fixed

> The view as json page shows exception when using TinyLfuBlockCache
> --
>
> Key: HBASE-26028
> URL: https://issues.apache.org/jira/browse/HBASE-26028
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: HBASE-26028-afterpatch.jpg, HBASE-26028-beforepatch.jpg
>
>
> Some variable in TinyLfuBlockCache should be marked as transient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-26028) The view as json page shows exception when using TinyLfuBlockCache

2021-06-30 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372313#comment-17372313
 ] 

Zheng Wang commented on HBASE-26028:


Thanks for the reviewing. [~vjasani]

> The view as json page shows exception when using TinyLfuBlockCache
> --
>
> Key: HBASE-26028
> URL: https://issues.apache.org/jira/browse/HBASE-26028
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: HBASE-26028-afterpatch.jpg, HBASE-26028-beforepatch.jpg
>
>
> Some variable in TinyLfuBlockCache should be marked as transient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException

2021-06-28 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26027:
---
Description: 
The batch api of HTable contains a param named results to store result or 
exception, its type is Object[].

If user pass an array with other type, eg: 
org.apache.hadoop.hbase.client.Result, and if we need to put an exception into 
it by some reason, then the ArrayStoreException will occur in 
AsyncRequestFutureImpl.updateResult, then the 
AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
actionsInProgress again and again, forever.

It is better to add an cutoff calculated by operationTimeout, instead of only 
depend on the value of actionsInProgress.

BTW, this issue only for 2.x, since 3.x the implement has refactored.

How to reproduce:

1: add sleep in RSRpcServices.multi to mock slow response
{code:java}
try {
 Thread.sleep(2000);
 } catch (InterruptedException e) {
 e.printStackTrace();
 }
{code}
2: set time out in config
{code:java}
conf.set("hbase.rpc.timeout","2000");
conf.set("hbase.client.operation.timeout","6000");
{code}
3: call batch api
{code:java}
Table table = HbaseUtil.getTable("test");
 byte[] cf = Bytes.toBytes("f");
 byte[] c = Bytes.toBytes("c1");
 List gets = new ArrayList<>();
 for (int i = 0; i < 10; i++) {
 byte[] rk = Bytes.toBytes("rk-" + i);
 Get get = new Get(rk);
 get.addColumn(cf, c);
 gets.add(get);
 }
 Result[] results = new Result[gets.size()];
 table.batch(gets, results);
{code}
The log will looks like below:
{code:java}
[ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - id=1 
error for test processing localhost,16020,1624343786295
java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
at 
org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242)
at 
org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087)
at 
org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021)
at 
org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683)
at 
org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716)
at 
org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69)
at 
org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
at java.util.concurrent.FutureTask.run(FutureTask.java)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10  actions to finish 
on table: test
[INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10  actions to finish 
on table: test
[INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10  actions to finish 
on table: 
[INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10  actions to finish 
on table: test
[INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10  actions to finish 
on table: test
[INFO ] [2021/06/22 23:24:00,400] main - #1, waiting for 10  actions to finish 
on table: test
[INFO ] [2021/06/22 23:24:10,408] main - #1, waiting for 10  actions to finish 
on table: test
[INFO ] [2021/06/22 23:24:20,413] main - #1, waiting for 10  actions to finish 
on table: test
{code}

  was:
The batch api of HTable contains a param named results to store result or 
exception, its type is Object[].

If user pass an array with other type, eg: 
org.apache.hadoop.hbase.client.Result, and we need to put an exception into it 
by some reason, then the ArrayStoreException will occur in 
AsyncRequestFutureImpl.updateResult, then the 
AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
actionsInProgress again and again, forever.

It is better to add an cutoff calculated by operationTimeout, instead of only 
depend on the value of actionsInProgress.

BTW, this issue only for 2.x, since 3.x the implement has refactored.

How to reproduce:

1: add sleep in RSRpcServices.multi to mock slow response


{code:java}
try {
 Thread.sleep(2000);
 } catch (InterruptedException e) {
 e.printStackTrace();
 }
{code}


2: set time out in config

{code:java}
conf.set("hbase.rpc.timeout","2000");
conf.set("hbase.client.operation.timeout","6000");
{code}


3: call 

[jira] [Comment Edited] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException

2021-06-25 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369449#comment-17369449
 ] 

Zheng Wang edited comment on HBASE-26027 at 6/25/21, 1:00 PM:
--

 
{quote}Don't really get this:

If user pass an array with other type, eg: 
org.apache.hadoop.hbase.client.Result, and we need to put an exception into it 
by some reason

how to reproduce it?
{quote}
 

{color:#172b4d}Added in description, thanks for the comment. [~reidchan] {color}


was (Author: filtertip):
 
{quote}Don't really get this:

If user pass an array with other type, eg: 
org.apache.hadoop.hbase.client.Result, and we need to put an exception into it 
by some reason

how to reproduce it?
{quote}
 

{color:#172b4d}Added in description, thanks for the comment.{color}

> The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone 
> caused by ArrayStoreException
> -
>
> Key: HBASE-26027
> URL: https://issues.apache.org/jira/browse/HBASE-26027
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.2.7, 2.3.5, 2.4.4
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> The batch api of HTable contains a param named results to store result or 
> exception, its type is Object[].
> If user pass an array with other type, eg: 
> org.apache.hadoop.hbase.client.Result, and we need to put an exception into 
> it by some reason, then the ArrayStoreException will occur in 
> AsyncRequestFutureImpl.updateResult, then the 
> AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
> AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
> actionsInProgress again and again, forever.
> It is better to add an cutoff calculated by operationTimeout, instead of only 
> depend on the value of actionsInProgress.
> BTW, this issue only for 2.x, since 3.x the implement has refactored.
> How to reproduce:
> 1: add sleep in RSRpcServices.multi to mock slow response
> {code:java}
> try {
>  Thread.sleep(2000);
>  } catch (InterruptedException e) {
>  e.printStackTrace();
>  }
> {code}
> 2: set time out in config
> {code:java}
> conf.set("hbase.rpc.timeout","2000");
> conf.set("hbase.client.operation.timeout","6000");
> {code}
> 3: call batch api
> {code:java}
> Table table = HbaseUtil.getTable("test");
>  byte[] cf = Bytes.toBytes("f");
>  byte[] c = Bytes.toBytes("c1");
>  List gets = new ArrayList<>();
>  for (int i = 0; i < 10; i++) {
>  byte[] rk = Bytes.toBytes("rk-" + i);
>  Get get = new Get(rk);
>  get.addColumn(cf, c);
>  gets.add(get);
>  }
>  Result[] results = new Result[gets.size()];
>  table.batch(gets, results);
> {code}
> The log will looks like below:
> {code:java}
> [ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - 
> id=1 error for test processing localhost,16020,1624343786295
> java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> [INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10  actions to 
> finish on table: 
> [INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] 

[jira] [Comment Edited] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException

2021-06-25 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369449#comment-17369449
 ] 

Zheng Wang edited comment on HBASE-26027 at 6/25/21, 1:00 PM:
--

 
{quote}Don't really get this:

If user pass an array with other type, eg: 
org.apache.hadoop.hbase.client.Result, and we need to put an exception into it 
by some reason

how to reproduce it?
{quote}
 

{color:#172b4d}Added in description, thanks for the comment.{color}


was (Author: filtertip):
 
{quote}Don't really get this:
{quote}If user pass an array with other type, eg: 
org.apache.hadoop.hbase.client.Result, and we need to put an exception into it 
by some reason
{quote}
how to reproduce it?

 

{color:#172b4d}Added in description, thanks for the comment.{color}
{quote}

> The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone 
> caused by ArrayStoreException
> -
>
> Key: HBASE-26027
> URL: https://issues.apache.org/jira/browse/HBASE-26027
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.2.7, 2.3.5, 2.4.4
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> The batch api of HTable contains a param named results to store result or 
> exception, its type is Object[].
> If user pass an array with other type, eg: 
> org.apache.hadoop.hbase.client.Result, and we need to put an exception into 
> it by some reason, then the ArrayStoreException will occur in 
> AsyncRequestFutureImpl.updateResult, then the 
> AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
> AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
> actionsInProgress again and again, forever.
> It is better to add an cutoff calculated by operationTimeout, instead of only 
> depend on the value of actionsInProgress.
> BTW, this issue only for 2.x, since 3.x the implement has refactored.
> How to reproduce:
> 1: add sleep in RSRpcServices.multi to mock slow response
> {code:java}
> try {
>  Thread.sleep(2000);
>  } catch (InterruptedException e) {
>  e.printStackTrace();
>  }
> {code}
> 2: set time out in config
> {code:java}
> conf.set("hbase.rpc.timeout","2000");
> conf.set("hbase.client.operation.timeout","6000");
> {code}
> 3: call batch api
> {code:java}
> Table table = HbaseUtil.getTable("test");
>  byte[] cf = Bytes.toBytes("f");
>  byte[] c = Bytes.toBytes("c1");
>  List gets = new ArrayList<>();
>  for (int i = 0; i < 10; i++) {
>  byte[] rk = Bytes.toBytes("rk-" + i);
>  Get get = new Get(rk);
>  get.addColumn(cf, c);
>  gets.add(get);
>  }
>  Result[] results = new Result[gets.size()];
>  table.batch(gets, results);
> {code}
> The log will looks like below:
> {code:java}
> [ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - 
> id=1 error for test processing localhost,16020,1624343786295
> java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> [INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10  actions to 
> finish on table: 
> [INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] 

[jira] [Comment Edited] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException

2021-06-25 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369449#comment-17369449
 ] 

Zheng Wang edited comment on HBASE-26027 at 6/25/21, 12:59 PM:
---

 
{quote}Don't really get this:
{quote}If user pass an array with other type, eg: 
org.apache.hadoop.hbase.client.Result, and we need to put an exception into it 
by some reason
{quote}
how to reproduce it?

 

{color:#172b4d}Added in description, thanks for the comment.{color}
{quote}


was (Author: filtertip):
{quote}Don't really get this:
{quote}If user pass an array with other type, eg: 
org.apache.hadoop.hbase.client.Result, and we need to put an exception into it 
by some reason
{quote}
how to reproduce it?
{quote}
Added in description, thanks for the comment.[~reidchan]

> The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone 
> caused by ArrayStoreException
> -
>
> Key: HBASE-26027
> URL: https://issues.apache.org/jira/browse/HBASE-26027
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.2.7, 2.3.5, 2.4.4
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> The batch api of HTable contains a param named results to store result or 
> exception, its type is Object[].
> If user pass an array with other type, eg: 
> org.apache.hadoop.hbase.client.Result, and we need to put an exception into 
> it by some reason, then the ArrayStoreException will occur in 
> AsyncRequestFutureImpl.updateResult, then the 
> AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
> AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
> actionsInProgress again and again, forever.
> It is better to add an cutoff calculated by operationTimeout, instead of only 
> depend on the value of actionsInProgress.
> BTW, this issue only for 2.x, since 3.x the implement has refactored.
> How to reproduce:
> 1: add sleep in RSRpcServices.multi to mock slow response
> {code:java}
> try {
>  Thread.sleep(2000);
>  } catch (InterruptedException e) {
>  e.printStackTrace();
>  }
> {code}
> 2: set time out in config
> {code:java}
> conf.set("hbase.rpc.timeout","2000");
> conf.set("hbase.client.operation.timeout","6000");
> {code}
> 3: call batch api
> {code:java}
> Table table = HbaseUtil.getTable("test");
>  byte[] cf = Bytes.toBytes("f");
>  byte[] c = Bytes.toBytes("c1");
>  List gets = new ArrayList<>();
>  for (int i = 0; i < 10; i++) {
>  byte[] rk = Bytes.toBytes("rk-" + i);
>  Get get = new Get(rk);
>  get.addColumn(cf, c);
>  gets.add(get);
>  }
>  Result[] results = new Result[gets.size()];
>  table.batch(gets, results);
> {code}
> The log will looks like below:
> {code:java}
> [ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - 
> id=1 error for test processing localhost,16020,1624343786295
> java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> [INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10  actions to 
> finish on table: 
> [INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] 

[jira] [Commented] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException

2021-06-25 Thread Zheng Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17369449#comment-17369449
 ] 

Zheng Wang commented on HBASE-26027:


{quote}Don't really get this:
{quote}If user pass an array with other type, eg: 
org.apache.hadoop.hbase.client.Result, and we need to put an exception into it 
by some reason
{quote}
how to reproduce it?
{quote}
Added in description, thanks for the comment.[~reidchan]

> The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone 
> caused by ArrayStoreException
> -
>
> Key: HBASE-26027
> URL: https://issues.apache.org/jira/browse/HBASE-26027
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.2.7, 2.3.5, 2.4.4
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> The batch api of HTable contains a param named results to store result or 
> exception, its type is Object[].
> If user pass an array with other type, eg: 
> org.apache.hadoop.hbase.client.Result, and we need to put an exception into 
> it by some reason, then the ArrayStoreException will occur in 
> AsyncRequestFutureImpl.updateResult, then the 
> AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
> AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
> actionsInProgress again and again, forever.
> It is better to add an cutoff calculated by operationTimeout, instead of only 
> depend on the value of actionsInProgress.
> BTW, this issue only for 2.x, since 3.x the implement has refactored.
> How to reproduce:
> 1: add sleep in RSRpcServices.multi to mock slow response
> {code:java}
> try {
>  Thread.sleep(2000);
>  } catch (InterruptedException e) {
>  e.printStackTrace();
>  }
> {code}
> 2: set time out in config
> {code:java}
> conf.set("hbase.rpc.timeout","2000");
> conf.set("hbase.client.operation.timeout","6000");
> {code}
> 3: call batch api
> {code:java}
> Table table = HbaseUtil.getTable("test");
>  byte[] cf = Bytes.toBytes("f");
>  byte[] c = Bytes.toBytes("c1");
>  List gets = new ArrayList<>();
>  for (int i = 0; i < 10; i++) {
>  byte[] rk = Bytes.toBytes("rk-" + i);
>  Get get = new Get(rk);
>  get.addColumn(cf, c);
>  gets.add(get);
>  }
>  Result[] results = new Result[gets.size()];
>  table.batch(gets, results);
> {code}
> The log will looks like below:
> {code:java}
> [ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - 
> id=1 error for test processing localhost,16020,1624343786295
> java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> [INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10  actions to 
> finish on table: 
> [INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:00,400] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:10,408] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:20,413] main - #1, waiting for 10  actions to 
> finish on table: test
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException

2021-06-25 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26027:
---
Environment: (was: {code:java}
 {code})

> The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone 
> caused by ArrayStoreException
> -
>
> Key: HBASE-26027
> URL: https://issues.apache.org/jira/browse/HBASE-26027
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.2.7, 2.3.5, 2.4.4
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
>
> The batch api of HTable contains a param named results to store result or 
> exception, its type is Object[].
> If user pass an array with other type, eg: 
> org.apache.hadoop.hbase.client.Result, and we need to put an exception into 
> it by some reason, then the ArrayStoreException will occur in 
> AsyncRequestFutureImpl.updateResult, then the 
> AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
> AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
> actionsInProgress again and again, forever.
> It is better to add an cutoff calculated by operationTimeout, instead of only 
> depend on the value of actionsInProgress.
> BTW, this issue only for 2.x, since 3.x the implement has refactored.
> How to reproduce:
> 1: add sleep in RSRpcServices.multi to mock slow response
> {code:java}
> try {
>  Thread.sleep(2000);
>  } catch (InterruptedException e) {
>  e.printStackTrace();
>  }
> {code}
> 2: set time out in config
> {code:java}
> conf.set("hbase.rpc.timeout","2000");
> conf.set("hbase.client.operation.timeout","6000");
> {code}
> 3: call batch api
> {code:java}
> Table table = HbaseUtil.getTable("test");
>  byte[] cf = Bytes.toBytes("f");
>  byte[] c = Bytes.toBytes("c1");
>  List gets = new ArrayList<>();
>  for (int i = 0; i < 10; i++) {
>  byte[] rk = Bytes.toBytes("rk-" + i);
>  Get get = new Get(rk);
>  get.addColumn(cf, c);
>  gets.add(get);
>  }
>  Result[] results = new Result[gets.size()];
>  table.batch(gets, results);
> {code}
> The log will looks like below:
> {code:java}
> [ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - 
> id=1 error for test processing localhost,16020,1624343786295
> java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69)
>   at 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>   at java.util.concurrent.FutureTask.run(FutureTask.java)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> [INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10  actions to 
> finish on table: 
> [INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:00,400] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:10,408] main - #1, waiting for 10  actions to 
> finish on table: test
> [INFO ] [2021/06/22 23:24:20,413] main - #1, waiting for 10  actions to 
> finish on table: test
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException

2021-06-25 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26027:
---
Description: 
The batch api of HTable contains a param named results to store result or 
exception, its type is Object[].

If user pass an array with other type, eg: 
org.apache.hadoop.hbase.client.Result, and we need to put an exception into it 
by some reason, then the ArrayStoreException will occur in 
AsyncRequestFutureImpl.updateResult, then the 
AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
actionsInProgress again and again, forever.

It is better to add an cutoff calculated by operationTimeout, instead of only 
depend on the value of actionsInProgress.

BTW, this issue only for 2.x, since 3.x the implement has refactored.

How to reproduce:

1: add sleep in RSRpcServices.multi to mock slow response


{code:java}
try {
 Thread.sleep(2000);
 } catch (InterruptedException e) {
 e.printStackTrace();
 }
{code}


2: set time out in config

{code:java}
conf.set("hbase.rpc.timeout","2000");
conf.set("hbase.client.operation.timeout","6000");
{code}


3: call batch api

{code:java}
Table table = HbaseUtil.getTable("test");
 byte[] cf = Bytes.toBytes("f");
 byte[] c = Bytes.toBytes("c1");
 List gets = new ArrayList<>();
 for (int i = 0; i < 10; i++) {
 byte[] rk = Bytes.toBytes("rk-" + i);
 Get get = new Get(rk);
 get.addColumn(cf, c);
 gets.add(get);
 }
 Result[] results = new Result[gets.size()];
 table.batch(gets, results);
{code}


The log will looks like below:
{code:java}
[ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - id=1 
error for test processing localhost,16020,1624343786295
java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
at 
org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242)
at 
org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087)
at 
org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021)
at 
org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683)
at 
org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716)
at 
org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69)
at 
org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
at java.util.concurrent.FutureTask.run(FutureTask.java)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10  actions to finish 
on table: test
[INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10  actions to finish 
on table: test
[INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10  actions to finish 
on table: 
[INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10  actions to finish 
on table: test
[INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10  actions to finish 
on table: test
[INFO ] [2021/06/22 23:24:00,400] main - #1, waiting for 10  actions to finish 
on table: test
[INFO ] [2021/06/22 23:24:10,408] main - #1, waiting for 10  actions to finish 
on table: test
[INFO ] [2021/06/22 23:24:20,413] main - #1, waiting for 10  actions to finish 
on table: test
{code}

  was:
The batch api of HTable contains a param named results to store result or 
exception, its type is Object[].

If user pass an array with other type, eg: 
org.apache.hadoop.hbase.client.Result, and we need to put an exception into it 
by some reason, then the ArrayStoreException will occur in 
AsyncRequestFutureImpl.updateResult, then the 
AsyncRequestFutureImpl.decActionCounter will be skipped, then in the 
AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the 
actionsInProgress again and again, forever.

It is better to add an cutoff calculated by operationTimeout, instead of only 
depend on the value of actionsInProgress.

BTW, this issue only for 2.x, since 3.x the implement has refactored.
{code:java}
[ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - id=1 
error for test processing localhost,16020,1624343786295
java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException
at 

[jira] [Updated] (HBASE-26028) The view as json page shows exception when using TinyLfuBlockCache

2021-06-24 Thread Zheng Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Wang updated HBASE-26028:
---
Attachment: HBASE-26028-beforepatch.jpg
HBASE-26028-afterpatch.jpg

> The view as json page shows exception when using TinyLfuBlockCache
> --
>
> Key: HBASE-26028
> URL: https://issues.apache.org/jira/browse/HBASE-26028
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Attachments: HBASE-26028-afterpatch.jpg, HBASE-26028-beforepatch.jpg
>
>
> Some variable in TinyLfuBlockCache should be marked as transient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   >