[jira] [Commented] (HBASE-27995) Missing null check in TestHFile

2023-08-04 Thread Konstantin Ryakhovskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751079#comment-17751079
 ] 

Konstantin Ryakhovskiy commented on HBASE-27995:


just thinking about it little bit more.

what is the use case for the negative capacity limit factor?

{{the following check of LruAdaptiveBlockCache will be always true and the 
block won't be written into the cache:}}
{code:java}
if (currentSize >= hardLimitSize) {code}
 

>  Missing null check in TestHFile
> 
>
> Key: HBASE-27995
> URL: https://issues.apache.org/jira/browse/HBASE-27995
> Project: HBase
>  Issue Type: Bug
>Reporter: ConfX
>Priority: Critical
> Attachments: HBASE-27995.patch, reproduce.sh
>
>
> h2. What happened
> After setting 
> {{{}hbase.lru.blockcache.hard.capacity.limit.factor=-0.4921875{}}}, running 
> test 
> {{{}org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache{}}}results
>  in a null pointer exception.
> h2. Where's the problem
> In the test:
> {noformat}
>       cachedBlock = combined.getBlock(key, false, false, true);
>       try {
> ...
>       } finally {
>         cachedBlock.release();
>       }{noformat}
> However, cacheBlock might not be initialized properly and could be null, 
> causing an unhandled NullPointerException.
> h2. How to reproduce
>  # set {{hbase.lru.blockcache.hard.capacity.limit.factor}} to {{-0.4921875 }}
>  # run 
> {{org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache}}
> you should observe
> {noformat}
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderCombinedCache(TestHFile.java:1052)
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderWithAdaptiveLruCombinedBlockCache(TestHFile.java:1011){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-27995) Missing null check in TestHFile

2023-08-04 Thread Konstantin Ryakhovskiy (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-27995:
---
Attachment: HBASE-27995.patch

>  Missing null check in TestHFile
> 
>
> Key: HBASE-27995
> URL: https://issues.apache.org/jira/browse/HBASE-27995
> Project: HBase
>  Issue Type: Bug
>Reporter: ConfX
>Priority: Critical
> Attachments: HBASE-27995.patch, reproduce.sh
>
>
> h2. What happened
> After setting 
> {{{}hbase.lru.blockcache.hard.capacity.limit.factor=-0.4921875{}}}, running 
> test 
> {{{}org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache{}}}results
>  in a null pointer exception.
> h2. Where's the problem
> In the test:
> {noformat}
>       cachedBlock = combined.getBlock(key, false, false, true);
>       try {
> ...
>       } finally {
>         cachedBlock.release();
>       }{noformat}
> However, cacheBlock might not be initialized properly and could be null, 
> causing an unhandled NullPointerException.
> h2. How to reproduce
>  # set {{hbase.lru.blockcache.hard.capacity.limit.factor}} to {{-0.4921875 }}
>  # run 
> {{org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache}}
> you should observe
> {noformat}
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderCombinedCache(TestHFile.java:1052)
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderWithAdaptiveLruCombinedBlockCache(TestHFile.java:1011){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-27995) Missing null check in TestHFile

2023-08-04 Thread Konstantin Ryakhovskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751004#comment-17751004
 ] 

Konstantin Ryakhovskiy edited comment on HBASE-27995 at 8/4/23 8:05 AM:


does it make sense to return an empty block instead of null, if there is a 
cache miss?
{code:java}
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BlockCacheKey key, 
boolean caching, boolean repeat, boolean updateCacheMetrics){code}
then, there will be a change in all the implementations of the interface 
BlockCache, and in a few places we will need to change
{code:java}
 result = getBlock(...); 
if (null != result) ...
{code}
to check if the result is empty (or null or empty)


was (Author: ryakhovskiy.k):
does it make sense to return an empty block instead of null, if there is a 
cache miss?
{code:java}
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BlockCacheKey key, 
boolean caching, boolean repeat, boolean updateCacheMetrics){code}
then in a few places we will need to change
{code} result = getBlock(...); 
if (null != result) ...
{code}
to check if the result is empty (or null or empty)

>  Missing null check in TestHFile
> 
>
> Key: HBASE-27995
> URL: https://issues.apache.org/jira/browse/HBASE-27995
> Project: HBase
>  Issue Type: Bug
>Reporter: ConfX
>Priority: Critical
> Attachments: reproduce.sh
>
>
> h2. What happened
> After setting 
> {{{}hbase.lru.blockcache.hard.capacity.limit.factor=-0.4921875{}}}, running 
> test 
> {{{}org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache{}}}results
>  in a null pointer exception.
> h2. Where's the problem
> In the test:
> {noformat}
>       cachedBlock = combined.getBlock(key, false, false, true);
>       try {
> ...
>       } finally {
>         cachedBlock.release();
>       }{noformat}
> However, cacheBlock might not be initialized properly and could be null, 
> causing an unhandled NullPointerException.
> h2. How to reproduce
>  # set {{hbase.lru.blockcache.hard.capacity.limit.factor}} to {{-0.4921875 }}
>  # run 
> {{org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache}}
> you should observe
> {noformat}
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderCombinedCache(TestHFile.java:1052)
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderWithAdaptiveLruCombinedBlockCache(TestHFile.java:1011){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HBASE-27995) Missing null check in TestHFile

2023-08-04 Thread Konstantin Ryakhovskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751004#comment-17751004
 ] 

Konstantin Ryakhovskiy edited comment on HBASE-27995 at 8/4/23 8:00 AM:


does it make sense to return an empty block instead of null, if there is a 
cache miss?
{code:java}
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BlockCacheKey key, 
boolean caching, boolean repeat, boolean updateCacheMetrics){code}
then in a few places we will need to change
{code} result = getBlock(...); 
if (null != result) ...
{code}
to check if the result is empty (or null or empty)


was (Author: ryakhovskiy.k):
does it make sense to return an empty block instead of null, if there is a 
cache miss?
{code:java}
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BlockCacheKey key, 
boolean caching, boolean repeat, boolean updateCacheMetrics){code}

>  Missing null check in TestHFile
> 
>
> Key: HBASE-27995
> URL: https://issues.apache.org/jira/browse/HBASE-27995
> Project: HBase
>  Issue Type: Bug
>Reporter: ConfX
>Priority: Critical
> Attachments: reproduce.sh
>
>
> h2. What happened
> After setting 
> {{{}hbase.lru.blockcache.hard.capacity.limit.factor=-0.4921875{}}}, running 
> test 
> {{{}org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache{}}}results
>  in a null pointer exception.
> h2. Where's the problem
> In the test:
> {noformat}
>       cachedBlock = combined.getBlock(key, false, false, true);
>       try {
> ...
>       } finally {
>         cachedBlock.release();
>       }{noformat}
> However, cacheBlock might not be initialized properly and could be null, 
> causing an unhandled NullPointerException.
> h2. How to reproduce
>  # set {{hbase.lru.blockcache.hard.capacity.limit.factor}} to {{-0.4921875 }}
>  # run 
> {{org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache}}
> you should observe
> {noformat}
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderCombinedCache(TestHFile.java:1052)
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderWithAdaptiveLruCombinedBlockCache(TestHFile.java:1011){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (HBASE-27995) Missing null check in TestHFile

2023-08-04 Thread Konstantin Ryakhovskiy (Jira)


[ https://issues.apache.org/jira/browse/HBASE-27995 ]


Konstantin Ryakhovskiy deleted comment on HBASE-27995:


was (Author: ryakhovskiy.k):
does it make sense to return an empty block instead of null, if there is a 
cache miss?
{code:java}
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BlockCacheKey key, 
boolean caching, boolean repeat, boolean updateCacheMetrics){code}

>  Missing null check in TestHFile
> 
>
> Key: HBASE-27995
> URL: https://issues.apache.org/jira/browse/HBASE-27995
> Project: HBase
>  Issue Type: Bug
>Reporter: ConfX
>Priority: Critical
> Attachments: reproduce.sh
>
>
> h2. What happened
> After setting 
> {{{}hbase.lru.blockcache.hard.capacity.limit.factor=-0.4921875{}}}, running 
> test 
> {{{}org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache{}}}results
>  in a null pointer exception.
> h2. Where's the problem
> In the test:
> {noformat}
>       cachedBlock = combined.getBlock(key, false, false, true);
>       try {
> ...
>       } finally {
>         cachedBlock.release();
>       }{noformat}
> However, cacheBlock might not be initialized properly and could be null, 
> causing an unhandled NullPointerException.
> h2. How to reproduce
>  # set {{hbase.lru.blockcache.hard.capacity.limit.factor}} to {{-0.4921875 }}
>  # run 
> {{org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache}}
> you should observe
> {noformat}
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderCombinedCache(TestHFile.java:1052)
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderWithAdaptiveLruCombinedBlockCache(TestHFile.java:1011){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27995) Missing null check in TestHFile

2023-08-04 Thread Konstantin Ryakhovskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751006#comment-17751006
 ] 

Konstantin Ryakhovskiy commented on HBASE-27995:


does it make sense to return an empty block instead of null, if there is a 
cache miss?
{code:java}
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BlockCacheKey key, 
boolean caching, boolean repeat, boolean updateCacheMetrics){code}

>  Missing null check in TestHFile
> 
>
> Key: HBASE-27995
> URL: https://issues.apache.org/jira/browse/HBASE-27995
> Project: HBase
>  Issue Type: Bug
>Reporter: ConfX
>Priority: Critical
> Attachments: reproduce.sh
>
>
> h2. What happened
> After setting 
> {{{}hbase.lru.blockcache.hard.capacity.limit.factor=-0.4921875{}}}, running 
> test 
> {{{}org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache{}}}results
>  in a null pointer exception.
> h2. Where's the problem
> In the test:
> {noformat}
>       cachedBlock = combined.getBlock(key, false, false, true);
>       try {
> ...
>       } finally {
>         cachedBlock.release();
>       }{noformat}
> However, cacheBlock might not be initialized properly and could be null, 
> causing an unhandled NullPointerException.
> h2. How to reproduce
>  # set {{hbase.lru.blockcache.hard.capacity.limit.factor}} to {{-0.4921875 }}
>  # run 
> {{org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache}}
> you should observe
> {noformat}
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderCombinedCache(TestHFile.java:1052)
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderWithAdaptiveLruCombinedBlockCache(TestHFile.java:1011){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27995) Missing null check in TestHFile

2023-08-04 Thread Konstantin Ryakhovskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751005#comment-17751005
 ] 

Konstantin Ryakhovskiy commented on HBASE-27995:


does it make sense to return an empty block instead of null, if there is a 
cache miss?
{code:java}
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BlockCacheKey key, 
boolean caching, boolean repeat, boolean updateCacheMetrics){code}

>  Missing null check in TestHFile
> 
>
> Key: HBASE-27995
> URL: https://issues.apache.org/jira/browse/HBASE-27995
> Project: HBase
>  Issue Type: Bug
>Reporter: ConfX
>Priority: Critical
> Attachments: reproduce.sh
>
>
> h2. What happened
> After setting 
> {{{}hbase.lru.blockcache.hard.capacity.limit.factor=-0.4921875{}}}, running 
> test 
> {{{}org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache{}}}results
>  in a null pointer exception.
> h2. Where's the problem
> In the test:
> {noformat}
>       cachedBlock = combined.getBlock(key, false, false, true);
>       try {
> ...
>       } finally {
>         cachedBlock.release();
>       }{noformat}
> However, cacheBlock might not be initialized properly and could be null, 
> causing an unhandled NullPointerException.
> h2. How to reproduce
>  # set {{hbase.lru.blockcache.hard.capacity.limit.factor}} to {{-0.4921875 }}
>  # run 
> {{org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache}}
> you should observe
> {noformat}
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderCombinedCache(TestHFile.java:1052)
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderWithAdaptiveLruCombinedBlockCache(TestHFile.java:1011){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (HBASE-27995) Missing null check in TestHFile

2023-08-04 Thread Konstantin Ryakhovskiy (Jira)


[ https://issues.apache.org/jira/browse/HBASE-27995 ]


Konstantin Ryakhovskiy deleted comment on HBASE-27995:


was (Author: ryakhovskiy.k):
does it make sense to return an empty block instead of null, if there is a 
cache miss?
{code:java}
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BlockCacheKey key, 
boolean caching, boolean repeat, boolean updateCacheMetrics){code}

>  Missing null check in TestHFile
> 
>
> Key: HBASE-27995
> URL: https://issues.apache.org/jira/browse/HBASE-27995
> Project: HBase
>  Issue Type: Bug
>Reporter: ConfX
>Priority: Critical
> Attachments: reproduce.sh
>
>
> h2. What happened
> After setting 
> {{{}hbase.lru.blockcache.hard.capacity.limit.factor=-0.4921875{}}}, running 
> test 
> {{{}org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache{}}}results
>  in a null pointer exception.
> h2. Where's the problem
> In the test:
> {noformat}
>       cachedBlock = combined.getBlock(key, false, false, true);
>       try {
> ...
>       } finally {
>         cachedBlock.release();
>       }{noformat}
> However, cacheBlock might not be initialized properly and could be null, 
> causing an unhandled NullPointerException.
> h2. How to reproduce
>  # set {{hbase.lru.blockcache.hard.capacity.limit.factor}} to {{-0.4921875 }}
>  # run 
> {{org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache}}
> you should observe
> {noformat}
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderCombinedCache(TestHFile.java:1052)
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderWithAdaptiveLruCombinedBlockCache(TestHFile.java:1011){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27995) Missing null check in TestHFile

2023-08-04 Thread Konstantin Ryakhovskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17751004#comment-17751004
 ] 

Konstantin Ryakhovskiy commented on HBASE-27995:


does it make sense to return an empty block instead of null, if there is a 
cache miss?
{code:java}
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getBlock(BlockCacheKey key, 
boolean caching, boolean repeat, boolean updateCacheMetrics){code}

>  Missing null check in TestHFile
> 
>
> Key: HBASE-27995
> URL: https://issues.apache.org/jira/browse/HBASE-27995
> Project: HBase
>  Issue Type: Bug
>Reporter: ConfX
>Priority: Critical
> Attachments: reproduce.sh
>
>
> h2. What happened
> After setting 
> {{{}hbase.lru.blockcache.hard.capacity.limit.factor=-0.4921875{}}}, running 
> test 
> {{{}org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache{}}}results
>  in a null pointer exception.
> h2. Where's the problem
> In the test:
> {noformat}
>       cachedBlock = combined.getBlock(key, false, false, true);
>       try {
> ...
>       } finally {
>         cachedBlock.release();
>       }{noformat}
> However, cacheBlock might not be initialized properly and could be null, 
> causing an unhandled NullPointerException.
> h2. How to reproduce
>  # set {{hbase.lru.blockcache.hard.capacity.limit.factor}} to {{-0.4921875 }}
>  # run 
> {{org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache}}
> you should observe
> {noformat}
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderCombinedCache(TestHFile.java:1052)
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderWithAdaptiveLruCombinedBlockCache(TestHFile.java:1011){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27995) Missing null check in TestHFile

2023-08-04 Thread Konstantin Ryakhovskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17750994#comment-17750994
 ] 

Konstantin Ryakhovskiy commented on HBASE-27995:


a PR#5343 is created

>  Missing null check in TestHFile
> 
>
> Key: HBASE-27995
> URL: https://issues.apache.org/jira/browse/HBASE-27995
> Project: HBase
>  Issue Type: Bug
>Reporter: ConfX
>Priority: Critical
> Attachments: reproduce.sh
>
>
> h2. What happened
> After setting 
> {{{}hbase.lru.blockcache.hard.capacity.limit.factor=-0.4921875{}}}, running 
> test 
> {{{}org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache{}}}results
>  in a null pointer exception.
> h2. Where's the problem
> In the test:
> {noformat}
>       cachedBlock = combined.getBlock(key, false, false, true);
>       try {
> ...
>       } finally {
>         cachedBlock.release();
>       }{noformat}
> However, cacheBlock might not be initialized properly and could be null, 
> causing an unhandled NullPointerException.
> h2. How to reproduce
>  # set {{hbase.lru.blockcache.hard.capacity.limit.factor}} to {{-0.4921875 }}
>  # run 
> {{org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache}}
> you should observe
> {noformat}
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderCombinedCache(TestHFile.java:1052)
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderWithAdaptiveLruCombinedBlockCache(TestHFile.java:1011){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (HBASE-27995) Missing null check in TestHFile

2023-08-04 Thread Konstantin Ryakhovskiy (Jira)


[ https://issues.apache.org/jira/browse/HBASE-27995 ]


Konstantin Ryakhovskiy deleted comment on HBASE-27995:


was (Author: ryakhovskiy.k):
a PR#5343 is created

>  Missing null check in TestHFile
> 
>
> Key: HBASE-27995
> URL: https://issues.apache.org/jira/browse/HBASE-27995
> Project: HBase
>  Issue Type: Bug
>Reporter: ConfX
>Priority: Critical
> Attachments: reproduce.sh
>
>
> h2. What happened
> After setting 
> {{{}hbase.lru.blockcache.hard.capacity.limit.factor=-0.4921875{}}}, running 
> test 
> {{{}org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache{}}}results
>  in a null pointer exception.
> h2. Where's the problem
> In the test:
> {noformat}
>       cachedBlock = combined.getBlock(key, false, false, true);
>       try {
> ...
>       } finally {
>         cachedBlock.release();
>       }{noformat}
> However, cacheBlock might not be initialized properly and could be null, 
> causing an unhandled NullPointerException.
> h2. How to reproduce
>  # set {{hbase.lru.blockcache.hard.capacity.limit.factor}} to {{-0.4921875 }}
>  # run 
> {{org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache}}
> you should observe
> {noformat}
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderCombinedCache(TestHFile.java:1052)
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderWithAdaptiveLruCombinedBlockCache(TestHFile.java:1011){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27995) Missing null check in TestHFile

2023-08-04 Thread Konstantin Ryakhovskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17750993#comment-17750993
 ] 

Konstantin Ryakhovskiy commented on HBASE-27995:


a PR#5343 is created

>  Missing null check in TestHFile
> 
>
> Key: HBASE-27995
> URL: https://issues.apache.org/jira/browse/HBASE-27995
> Project: HBase
>  Issue Type: Bug
>Reporter: ConfX
>Priority: Critical
> Attachments: reproduce.sh
>
>
> h2. What happened
> After setting 
> {{{}hbase.lru.blockcache.hard.capacity.limit.factor=-0.4921875{}}}, running 
> test 
> {{{}org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache{}}}results
>  in a null pointer exception.
> h2. Where's the problem
> In the test:
> {noformat}
>       cachedBlock = combined.getBlock(key, false, false, true);
>       try {
> ...
>       } finally {
>         cachedBlock.release();
>       }{noformat}
> However, cacheBlock might not be initialized properly and could be null, 
> causing an unhandled NullPointerException.
> h2. How to reproduce
>  # set {{hbase.lru.blockcache.hard.capacity.limit.factor}} to {{-0.4921875 }}
>  # run 
> {{org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache}}
> you should observe
> {noformat}
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderCombinedCache(TestHFile.java:1052)
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderWithAdaptiveLruCombinedBlockCache(TestHFile.java:1011){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27995) Missing null check in TestHFile

2023-08-04 Thread Konstantin Ryakhovskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17750987#comment-17750987
 ] 

Konstantin Ryakhovskiy commented on HBASE-27995:


but the test will fail with assertion error on the highlighted line:
{code:java}
  cachedBlock = combined.getBlock(key, false, false, true);
      try {
Assert.assertNotNull(cachedBlock); 
...
      } finally {
        if (null != cachedBlock) cachedBlock.release();
      }
 {code}

>  Missing null check in TestHFile
> 
>
> Key: HBASE-27995
> URL: https://issues.apache.org/jira/browse/HBASE-27995
> Project: HBase
>  Issue Type: Bug
>Reporter: ConfX
>Priority: Critical
> Attachments: reproduce.sh
>
>
> h2. What happened
> After setting 
> {{{}hbase.lru.blockcache.hard.capacity.limit.factor=-0.4921875{}}}, running 
> test 
> {{{}org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache{}}}results
>  in a null pointer exception.
> h2. Where's the problem
> In the test:
> {noformat}
>       cachedBlock = combined.getBlock(key, false, false, true);
>       try {
> ...
>       } finally {
>         cachedBlock.release();
>       }{noformat}
> However, cacheBlock might not be initialized properly and could be null, 
> causing an unhandled NullPointerException.
> h2. How to reproduce
>  # set {{hbase.lru.blockcache.hard.capacity.limit.factor}} to {{-0.4921875 }}
>  # run 
> {{org.apache.hadoop.hbase.io.hfile.TestHFile#testReaderWithAdaptiveLruCombinedBlockCache}}
> you should observe
> {noformat}
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderCombinedCache(TestHFile.java:1052)
>     at 
> org.apache.hadoop.hbase.io.hfile.TestHFile.testReaderWithAdaptiveLruCombinedBlockCache(TestHFile.java:1011){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27277) TestRaceBetweenSCPAndTRSP fails in pre commit

2022-08-22 Thread Konstantin Ryakhovskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582827#comment-17582827
 ] 

Konstantin Ryakhovskiy commented on HBASE-27277:


cannot reproduce on master after 100x executions
{code:java}
for i in {1..100}; do mvn test -Dtest=TestRaceBetweenSCPAndTRSP; done {code}

> TestRaceBetweenSCPAndTRSP fails in pre commit
> -
>
> Key: HBASE-27277
> URL: https://issues.apache.org/jira/browse/HBASE-27277
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Duo Zhang
>Priority: Major
>
> Seems the PE worker is stuck here. Need dig more.
> {noformat}
> "PEWorker-5" daemon prio=5 tid=326 in Object.wait()
> java.lang.Thread.State: WAITING (on object monitor)
> at java.base@11.0.10/jdk.internal.misc.Unsafe.park(Native Method)
> at 
> java.base@11.0.10/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
> at 
> java.base@11.0.10/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
> at 
> java.base@11.0.10/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039)
> at 
> java.base@11.0.10/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
> at 
> java.base@11.0.10/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
> at 
> app//org.apache.hadoop.hbase.master.assignment.TestRaceBetweenSCPAndTRSP$AssignmentManagerForTest.getRegionsOnServer(TestRaceBetweenSCPAndTRSP.java:97)
> at 
> app//org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.getRegionsOnCrashedServer(ServerCrashProcedure.java:288)
> at 
> app//org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:195)
> at 
> app//org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:66)
> at 
> app//org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)
> at 
> app//org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:919)
> at 
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)
> at 
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)
> at 
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)
> at 
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1962)
> at 
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread$$Lambda$477/0x000800ac1840.call(Unknown
>  Source)
> at 
> app//org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)
> at 
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1989)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-27277) TestRaceBetweenSCPAndTRSP fails in pre commit

2022-08-18 Thread Konstantin Ryakhovskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581557#comment-17581557
 ] 

Konstantin Ryakhovskiy commented on HBASE-27277:


cannot reproduce on master after 5x attempts
{code:java}
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.741 s - in 
org.apache.hadoop.hbase.master.assignment.TestRaceBetweenSCPAndTRSP {code}

> TestRaceBetweenSCPAndTRSP fails in pre commit
> -
>
> Key: HBASE-27277
> URL: https://issues.apache.org/jira/browse/HBASE-27277
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Duo Zhang
>Priority: Major
>
> Seems the PE worker is stuck here. Need dig more.
> {noformat}
> "PEWorker-5" daemon prio=5 tid=326 in Object.wait()
> java.lang.Thread.State: WAITING (on object monitor)
> at java.base@11.0.10/jdk.internal.misc.Unsafe.park(Native Method)
> at 
> java.base@11.0.10/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
> at 
> java.base@11.0.10/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
> at 
> java.base@11.0.10/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039)
> at 
> java.base@11.0.10/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
> at 
> java.base@11.0.10/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
> at 
> app//org.apache.hadoop.hbase.master.assignment.TestRaceBetweenSCPAndTRSP$AssignmentManagerForTest.getRegionsOnServer(TestRaceBetweenSCPAndTRSP.java:97)
> at 
> app//org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.getRegionsOnCrashedServer(ServerCrashProcedure.java:288)
> at 
> app//org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:195)
> at 
> app//org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:66)
> at 
> app//org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)
> at 
> app//org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:919)
> at 
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1650)
> at 
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1396)
> at 
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:75)
> at 
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:1962)
> at 
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread$$Lambda$477/0x000800ac1840.call(Unknown
>  Source)
> at 
> app//org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216)
> at 
> app//org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1989)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-26775) TestProcedureSchedulerConcurrency fails in pre commit

2022-08-07 Thread Konstantin Ryakhovskiy (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy reassigned HBASE-26775:
--

Assignee: Konstantin Ryakhovskiy

> TestProcedureSchedulerConcurrency fails in pre commit
> -
>
> Key: HBASE-26775
> URL: https://issues.apache.org/jira/browse/HBASE-26775
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, test
>Reporter: Duo Zhang
>Assignee: Konstantin Ryakhovskiy
>Priority: Major
> Attachments: HBASE-26775.patch
>
>
> Saw this on the jenkins page. Seems like a test issue.
> {noformat}
> Exception in thread "Thread-10" java.util.ConcurrentModificationException
>   at java.base/java.util.ArrayDeque.nonNullElementAt(ArrayDeque.java:271)
>   at java.base/java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:701)
>   at 
> java.base/java.util.AbstractCollection.toString(AbstractCollection.java:472)
>   at java.base/java.lang.String.valueOf(String.java:2951)
>   at java.base/java.lang.StringBuilder.append(StringBuilder.java:168)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureEvent.toString(ProcedureEvent.java:134)
>   at java.base/java.lang.String.valueOf(String.java:2951)
>   at java.base/java.lang.StringBuilder.append(StringBuilder.java:168)
>   at 
> org.apache.hadoop.hbase.procedure2.TestProcedureSchedulerConcurrency$2.run(TestProcedureSchedulerConcurrency.java:130)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-26775) TestProcedureSchedulerConcurrency fails in pre commit

2022-08-07 Thread Konstantin Ryakhovskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576394#comment-17576394
 ] 

Konstantin Ryakhovskiy commented on HBASE-26775:


opened - [https://github.com/apache/hbase/pull/4681]

thanks

> TestProcedureSchedulerConcurrency fails in pre commit
> -
>
> Key: HBASE-26775
> URL: https://issues.apache.org/jira/browse/HBASE-26775
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, test
>Reporter: Duo Zhang
>Priority: Major
> Attachments: HBASE-26775.patch
>
>
> Saw this on the jenkins page. Seems like a test issue.
> {noformat}
> Exception in thread "Thread-10" java.util.ConcurrentModificationException
>   at java.base/java.util.ArrayDeque.nonNullElementAt(ArrayDeque.java:271)
>   at java.base/java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:701)
>   at 
> java.base/java.util.AbstractCollection.toString(AbstractCollection.java:472)
>   at java.base/java.lang.String.valueOf(String.java:2951)
>   at java.base/java.lang.StringBuilder.append(StringBuilder.java:168)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureEvent.toString(ProcedureEvent.java:134)
>   at java.base/java.lang.String.valueOf(String.java:2951)
>   at java.base/java.lang.StringBuilder.append(StringBuilder.java:168)
>   at 
> org.apache.hadoop.hbase.procedure2.TestProcedureSchedulerConcurrency$2.run(TestProcedureSchedulerConcurrency.java:130)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-26775) TestProcedureSchedulerConcurrency fails in pre commit

2022-08-05 Thread Konstantin Ryakhovskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575692#comment-17575692
 ] 

Konstantin Ryakhovskiy commented on HBASE-26775:


do you think there should be few PRs towards each 2.x branch?

branch-2, branch-2.0, branch-2.1, etc?

> TestProcedureSchedulerConcurrency fails in pre commit
> -
>
> Key: HBASE-26775
> URL: https://issues.apache.org/jira/browse/HBASE-26775
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, test
>Reporter: Duo Zhang
>Priority: Major
> Attachments: HBASE-26775.patch
>
>
> Saw this on the jenkins page. Seems like a test issue.
> {noformat}
> Exception in thread "Thread-10" java.util.ConcurrentModificationException
>   at java.base/java.util.ArrayDeque.nonNullElementAt(ArrayDeque.java:271)
>   at java.base/java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:701)
>   at 
> java.base/java.util.AbstractCollection.toString(AbstractCollection.java:472)
>   at java.base/java.lang.String.valueOf(String.java:2951)
>   at java.base/java.lang.StringBuilder.append(StringBuilder.java:168)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureEvent.toString(ProcedureEvent.java:134)
>   at java.base/java.lang.String.valueOf(String.java:2951)
>   at java.base/java.lang.StringBuilder.append(StringBuilder.java:168)
>   at 
> org.apache.hadoop.hbase.procedure2.TestProcedureSchedulerConcurrency$2.run(TestProcedureSchedulerConcurrency.java:130)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-26775) TestProcedureSchedulerConcurrency fails in pre commit

2022-08-05 Thread Konstantin Ryakhovskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575634#comment-17575634
 ] 

Konstantin Ryakhovskiy commented on HBASE-26775:


patch attached

> TestProcedureSchedulerConcurrency fails in pre commit
> -
>
> Key: HBASE-26775
> URL: https://issues.apache.org/jira/browse/HBASE-26775
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, test
>Reporter: Duo Zhang
>Priority: Major
> Attachments: HBASE-26775.patch
>
>
> Saw this on the jenkins page. Seems like a test issue.
> {noformat}
> Exception in thread "Thread-10" java.util.ConcurrentModificationException
>   at java.base/java.util.ArrayDeque.nonNullElementAt(ArrayDeque.java:271)
>   at java.base/java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:701)
>   at 
> java.base/java.util.AbstractCollection.toString(AbstractCollection.java:472)
>   at java.base/java.lang.String.valueOf(String.java:2951)
>   at java.base/java.lang.StringBuilder.append(StringBuilder.java:168)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureEvent.toString(ProcedureEvent.java:134)
>   at java.base/java.lang.String.valueOf(String.java:2951)
>   at java.base/java.lang.StringBuilder.append(StringBuilder.java:168)
>   at 
> org.apache.hadoop.hbase.procedure2.TestProcedureSchedulerConcurrency$2.run(TestProcedureSchedulerConcurrency.java:130)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-26775) TestProcedureSchedulerConcurrency fails in pre commit

2022-08-05 Thread Konstantin Ryakhovskiy (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-26775:
---
Attachment: HBASE-26775.patch

> TestProcedureSchedulerConcurrency fails in pre commit
> -
>
> Key: HBASE-26775
> URL: https://issues.apache.org/jira/browse/HBASE-26775
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, test
>Reporter: Duo Zhang
>Priority: Major
> Attachments: HBASE-26775.patch
>
>
> Saw this on the jenkins page. Seems like a test issue.
> {noformat}
> Exception in thread "Thread-10" java.util.ConcurrentModificationException
>   at java.base/java.util.ArrayDeque.nonNullElementAt(ArrayDeque.java:271)
>   at java.base/java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:701)
>   at 
> java.base/java.util.AbstractCollection.toString(AbstractCollection.java:472)
>   at java.base/java.lang.String.valueOf(String.java:2951)
>   at java.base/java.lang.StringBuilder.append(StringBuilder.java:168)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureEvent.toString(ProcedureEvent.java:134)
>   at java.base/java.lang.String.valueOf(String.java:2951)
>   at java.base/java.lang.StringBuilder.append(StringBuilder.java:168)
>   at 
> org.apache.hadoop.hbase.procedure2.TestProcedureSchedulerConcurrency$2.run(TestProcedureSchedulerConcurrency.java:130)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-26775) TestProcedureSchedulerConcurrency fails in pre commit

2022-08-04 Thread Konstantin Ryakhovskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575298#comment-17575298
 ] 

Konstantin Ryakhovskiy commented on HBASE-26775:


should this one be closed then?

> TestProcedureSchedulerConcurrency fails in pre commit
> -
>
> Key: HBASE-26775
> URL: https://issues.apache.org/jira/browse/HBASE-26775
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, test
>Reporter: Duo Zhang
>Priority: Major
>
> Saw this on the jenkins page. Seems like a test issue.
> {noformat}
> Exception in thread "Thread-10" java.util.ConcurrentModificationException
>   at java.base/java.util.ArrayDeque.nonNullElementAt(ArrayDeque.java:271)
>   at java.base/java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:701)
>   at 
> java.base/java.util.AbstractCollection.toString(AbstractCollection.java:472)
>   at java.base/java.lang.String.valueOf(String.java:2951)
>   at java.base/java.lang.StringBuilder.append(StringBuilder.java:168)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureEvent.toString(ProcedureEvent.java:134)
>   at java.base/java.lang.String.valueOf(String.java:2951)
>   at java.base/java.lang.StringBuilder.append(StringBuilder.java:168)
>   at 
> org.apache.hadoop.hbase.procedure2.TestProcedureSchedulerConcurrency$2.run(TestProcedureSchedulerConcurrency.java:130)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-26775) TestProcedureSchedulerConcurrency fails in pre commit

2022-08-04 Thread Konstantin Ryakhovskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575297#comment-17575297
 ] 

Konstantin Ryakhovskiy commented on HBASE-26775:


got it indeed tested and results are good on master:
{code:java}
[INFO] ---
[INFO]  T E S T S
[INFO] ---
[INFO] Running 
org.apache.hadoop.hbase.procedure2.TestProcedureSchedulerConcurrency
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.703 s 
- in org.apache.hadoop.hbase.procedure2.TestProcedureSchedulerConcurrency
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0
 {code}

> TestProcedureSchedulerConcurrency fails in pre commit
> -
>
> Key: HBASE-26775
> URL: https://issues.apache.org/jira/browse/HBASE-26775
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, test
>Reporter: Duo Zhang
>Priority: Major
>
> Saw this on the jenkins page. Seems like a test issue.
> {noformat}
> Exception in thread "Thread-10" java.util.ConcurrentModificationException
>   at java.base/java.util.ArrayDeque.nonNullElementAt(ArrayDeque.java:271)
>   at java.base/java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:701)
>   at 
> java.base/java.util.AbstractCollection.toString(AbstractCollection.java:472)
>   at java.base/java.lang.String.valueOf(String.java:2951)
>   at java.base/java.lang.StringBuilder.append(StringBuilder.java:168)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureEvent.toString(ProcedureEvent.java:134)
>   at java.base/java.lang.String.valueOf(String.java:2951)
>   at java.base/java.lang.StringBuilder.append(StringBuilder.java:168)
>   at 
> org.apache.hadoop.hbase.procedure2.TestProcedureSchedulerConcurrency$2.run(TestProcedureSchedulerConcurrency.java:130)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-26775) TestProcedureSchedulerConcurrency fails in pre commit

2022-08-04 Thread Konstantin Ryakhovskiy (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575271#comment-17575271
 ] 

Konstantin Ryakhovskiy commented on HBASE-26775:


hi, which branch is impacted?

can you please share a link to the build result?

thanks

> TestProcedureSchedulerConcurrency fails in pre commit
> -
>
> Key: HBASE-26775
> URL: https://issues.apache.org/jira/browse/HBASE-26775
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, test
>Reporter: Duo Zhang
>Priority: Major
>
> Saw this on the jenkins page. Seems like a test issue.
> {noformat}
> Exception in thread "Thread-10" java.util.ConcurrentModificationException
>   at java.base/java.util.ArrayDeque.nonNullElementAt(ArrayDeque.java:271)
>   at java.base/java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:701)
>   at 
> java.base/java.util.AbstractCollection.toString(AbstractCollection.java:472)
>   at java.base/java.lang.String.valueOf(String.java:2951)
>   at java.base/java.lang.StringBuilder.append(StringBuilder.java:168)
>   at 
> org.apache.hadoop.hbase.procedure2.ProcedureEvent.toString(ProcedureEvent.java:134)
>   at java.base/java.lang.String.valueOf(String.java:2951)
>   at java.base/java.lang.StringBuilder.append(StringBuilder.java:168)
>   at 
> org.apache.hadoop.hbase.procedure2.TestProcedureSchedulerConcurrency$2.run(TestProcedureSchedulerConcurrency.java:130)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2021-11-30 Thread Konstantin Ryakhovskiy (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy reassigned HBASE-16142:
--

Assignee: (was: Konstantin Ryakhovskiy)

> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: Michael Stack
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-16142.master.001.patch, 
> HBASE-16142.master.002.patch, HBASE-16142.master.003.patch, 
> HBASE-16142.master.004.patch
>
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-26 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-16142:
---
Status: Patch Available  (was: Open)

> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-16142.master.001.patch, 
> HBASE-16142.master.002.patch, HBASE-16142.master.003.patch, 
> HBASE-16142.master.004.patch
>
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-26 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-16142:
---
Attachment: HBASE-16142.master.004.patch

- add more comments with links to JMC and JFR
- modify log-directory initialization, use environment variable HBASE_LOG_DIR
- add more info to the printUsage() method


> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-16142.master.001.patch, 
> HBASE-16142.master.002.patch, HBASE-16142.master.003.patch, 
> HBASE-16142.master.004.patch
>
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-26 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-16142:
---
Status: Open  (was: Patch Available)

> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-16142.master.001.patch, 
> HBASE-16142.master.002.patch, HBASE-16142.master.003.patch
>
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15625) Make minimum values configurable and smaller

2016-07-26 Thread Konstantin Ryakhovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394410#comment-15394410
 ] 

Konstantin Ryakhovskiy commented on HBASE-15625:


[~asher] are you working on this issue?
If not, can I re-assign and submit the patch?
Thanks

> Make minimum values configurable and smaller
> 
>
> Key: HBASE-15625
> URL: https://issues.apache.org/jira/browse/HBASE-15625
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Jean-Marc Spaggiari
>Assignee: Asher Bartch
>Priority: Minor
>  Labels: beginner
>
> When we start a RS, we check 
> HConstants.HBASE_CLUSTER_MINIMUM_MEMORY_THRESHOLD to make sure we always keep 
> 20% of the heap for HBase (See below). In the past maximum heap size was 
> about 20GB, which means 4GB for HBase.
> Today, with huge heaps and GC1, 20% gives a lot to HBase. Like with 80GB 
> heap, it gives 16GB, which I think it not required.
> We need to make HBASE_CLUSTER_MINIMUM_MEMORY_THRESHOLD configurable and lower 
> it's default value to 10%. It will not make any difference on any HBase 
> configuration but will allow admins to be more flexible.
> Same thing for the minimum memstore and blockcache sizes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-25 Thread Konstantin Ryakhovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391663#comment-15391663
 ] 

Konstantin Ryakhovskiy commented on HBASE-16142:


[~stack] what do you mean?
Should I add something, like a button to trigger JFR?
or do you mean - the config set has to be included into trace, like additional 
information has to be dumped apart from JFR-file? Or do we need to include some 
additional metrics?

Can you please provide more details about your idea?



> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-16142.master.001.patch, 
> HBASE-16142.master.002.patch, HBASE-16142.master.003.patch
>
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-21 Thread Konstantin Ryakhovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387580#comment-15387580
 ] 

Konstantin Ryakhovskiy commented on HBASE-16142:


failed tests are not related to the patch since patch does not deal with 
existing functionality.

> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-16142.master.001.patch, 
> HBASE-16142.master.002.patch, HBASE-16142.master.003.patch
>
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-20 Thread Konstantin Ryakhovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385964#comment-15385964
 ] 

Konstantin Ryakhovskiy commented on HBASE-16142:


failed tests are not related to the patch since patch does not deal with 
existing functionality.

> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-16142.master.001.patch, 
> HBASE-16142.master.002.patch, HBASE-16142.master.003.patch
>
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-20 Thread Konstantin Ryakhovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385493#comment-15385493
 ] 

Konstantin Ryakhovskiy commented on HBASE-16142:


regarding imports: 
I found a openjdk ticket to implement some JFR events for JavaFX runtime: 
https://bugs.openjdk.java.net/browse/JDK-8098161
So, they also use classes from com.oracle.jrockit.jfr... 

There is also an open question regarding licensing and the 
UnlockCommercialFeatures JVM option

> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-16142.master.001.patch, 
> HBASE-16142.master.002.patch, HBASE-16142.master.003.patch
>
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-20 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-16142:
---
Attachment: HBASE-16142.master.003.patch

licenses added

> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-16142.master.001.patch, 
> HBASE-16142.master.002.patch, HBASE-16142.master.003.patch
>
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-20 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-16142:
---
Status: Patch Available  (was: Open)

> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-16142.master.001.patch, 
> HBASE-16142.master.002.patch, HBASE-16142.master.003.patch
>
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-20 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-16142:
---
Status: Open  (was: Patch Available)

> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-16142.master.001.patch, 
> HBASE-16142.master.002.patch
>
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-16230) Calling 'get' in hbase shell with table name that doesn't exist causes it to hang for long time

2016-07-18 Thread Konstantin Ryakhovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382481#comment-15382481
 ] 

Konstantin Ryakhovskiy edited comment on HBASE-16230 at 7/18/16 3:47 PM:
-

[~mantonov], yes, first I have tested that with java api.
I have just made a fresh build out of branch-1.3: mvn -DskipTests package 
assembly:single
then, started hbase, then hbase-shell, run the command above (get 
'table_that_doesnt_exist', 'x') and got following:
{code}
hbase(main):003:0> get 'table_that_doesnt_exist', 'x'
COLUMN  CELL



ERROR: Unknown table table_that_doesnt_exist!
{code}
I have also tried to create table, disable and then get -- the same result, 
exception as expected.
I also tried to drop the table afterwards and execute get and got the same 
result, exception as expected.

Still cannot reproduce on branch-1.3.


was (Author: ryakhovskiy.k):
[~mantonov], yes, first I have tested that with java api.
I have just made a fresh build out of branch-1.3: mvn -DskipTests package 
assembly:single
then, started hbase, then hbase-shell, run the command above (get 
'table_that_doesnt_exist', 'x') and got following:

hbase(main):003:0> get 'table_that_doesnt_exist', 'x'
COLUMN  CELL



ERROR: Unknown table table_that_doesnt_exist!

I have also tried to create table, disable and then get -- the same result, 
exception as expected.
I also tried to drop the table afterwards and got the same result, exception as 
expected.

Still cannot reproduce on branch-1.3.

> Calling 'get' in hbase shell with table name that doesn't exist causes it to 
> hang for long time
> ---
>
> Key: HBASE-16230
> URL: https://issues.apache.org/jira/browse/HBASE-16230
> Project: HBase
>  Issue Type: Bug
>  Components: Client, shell
>Affects Versions: 1.3.0
>Reporter: Mikhail Antonov
>
> get 'table_that_doesnt_exist', 'x'
> hangs for duration that looks more like rpc timeout, then says:
> ERROR: HRegionInfo was null in 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16230) Calling 'get' in hbase shell with table name that doesn't exist causes it to hang for long time

2016-07-18 Thread Konstantin Ryakhovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382481#comment-15382481
 ] 

Konstantin Ryakhovskiy commented on HBASE-16230:


[~mantonov], yes, first I have tested that with java api.
I have just made a fresh build out of branch-1.3: mvn -DskipTests package 
assembly:single
then, started hbase, then hbase-shell, run the command above (get 
'table_that_doesnt_exist', 'x') and got following:

hbase(main):003:0> get 'table_that_doesnt_exist', 'x'
COLUMN  CELL



ERROR: Unknown table table_that_doesnt_exist!

I have also tried to create table, disable and then get -- the same result, 
exception as expected.
I also tried to drop the table afterwards and got the same result, exception as 
expected.

Still cannot reproduce on branch-1.3.

> Calling 'get' in hbase shell with table name that doesn't exist causes it to 
> hang for long time
> ---
>
> Key: HBASE-16230
> URL: https://issues.apache.org/jira/browse/HBASE-16230
> Project: HBase
>  Issue Type: Bug
>  Components: Client, shell
>Affects Versions: 1.3.0
>Reporter: Mikhail Antonov
>
> get 'table_that_doesnt_exist', 'x'
> hangs for duration that looks more like rpc timeout, then says:
> ERROR: HRegionInfo was null in 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16230) Calling 'get' in hbase shell with table name that doesn't exist causes it to hang for long time

2016-07-14 Thread Konstantin Ryakhovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378316#comment-15378316
 ] 

Konstantin Ryakhovskiy commented on HBASE-16230:


I have written a simple straightforward test for that issue using mini cluster 
on branch-1.3, but {{nonExistingTable.get(get)}} throws a 
{{TableNotFoundException}} as expected.
Could it be a configuration issue?

> Calling 'get' in hbase shell with table name that doesn't exist causes it to 
> hang for long time
> ---
>
> Key: HBASE-16230
> URL: https://issues.apache.org/jira/browse/HBASE-16230
> Project: HBase
>  Issue Type: Bug
>  Components: Client, shell
>Affects Versions: 1.3.0
>Reporter: Mikhail Antonov
>
> get 'table_that_doesnt_exist', 'x'
> hangs for duration that looks more like rpc timeout, then says:
> ERROR: HRegionInfo was null in 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-16224) Reduce the number of RPCs for the large PUTs

2016-07-14 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy reassigned HBASE-16224:
--

Assignee: Konstantin Ryakhovskiy

> Reduce the number of RPCs for the large PUTs
> 
>
> Key: HBASE-16224
> URL: https://issues.apache.org/jira/browse/HBASE-16224
> Project: HBase
>  Issue Type: Improvement
>Reporter: ChiaPing Tsai
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
> Attachments: HBASE-16224-v1.patch, HBASE-16224-v2.patch, 
> HBASE-16224-v3.patch
>
>
> This patch is proposed to reduce the number of RPC for the large PUTs 
> The number and data size of write thread(SingleServerRequestRunnable) is a 
> result of three main factors:
> 1) The flush size taken by BufferedMutatorImpl#backgroundFlushCommits
> 2) The limit of task number
> 3) ClientBackoffPolicy
> A lot of threads created with less MUTATIONs is a result of two reason: 1) 
> many regions of target table are in different server. 2) flush size in step 
> one is summed by “all” server rather than “individual” server
> This patch removes the limit of flush size in step one and add maximum size 
> to submit for each server in the AsyncProcess



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-12 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-16142:
---
Attachment: HBASE-16142.master.002.patch

modified imports

> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-16142.master.001.patch, 
> HBASE-16142.master.002.patch
>
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-11 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-16142:
---
Attachment: HBASE-16142.master.001.patch

Added class JavaFlightRecorder and test.
The class has a main method to be called with command line arguments:
--duration= [--output=] 
if output is not provided, JFR trace will be dumped into HBase log directory.
the JVM option UnlockCommercialFeatures has to be enabled.

Do not know how to test if UnlockCommercialFeatures is disabled/enabled by 
default. Currently, tests are just skipped if the option is disabled.

The patch on the review board:
https://reviews.apache.org/r/49888/


 

> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-16142.master.001.patch
>
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-11 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-16142:
---
Comment: was deleted

(was: default JFR tracing added as java-class with main method.
This works when commercial features are enabled.
How should we proceed with testing?
it does not make sense to fail test if commercial features are disabled by 
default.
from the other perspective, the test should fail, if commercial features are 
enabled by default, but not enabled for particular run)

> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-11 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-16142:
---
Attachment: (was: HBASE-16142.master.001.patch)

> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-11 Thread Konstantin Ryakhovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15370334#comment-15370334
 ] 

Konstantin Ryakhovskiy edited comment on HBASE-16142 at 7/11/16 7:52 AM:
-

default JFR tracing added as java-class with main method.
This works when commercial features are enabled.
How should we proceed with testing?
it does not make sense to fail test if commercial features are disabled by 
default.
from the other perspective, the test should fail, if commercial features are 
enabled by default, but not enabled for particular run


was (Author: ryakhovskiy.k):
default JFR tracing added as java-class with main method

> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-16142.master.001.patch
>
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-11 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-16142:
---
Attachment: HBASE-16142.master.001.patch

default JFR tracing added as java-class with main method

> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-16142.master.001.patch
>
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-10 Thread Konstantin Ryakhovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369695#comment-15369695
 ] 

Konstantin Ryakhovskiy commented on HBASE-16142:


I think, http://hirt.se/blog/?p=513 is something different, as I understood, it 
is a custom jvm performance counter.
>From the other side, FlightRecorder bean is kind of default set of performance 
>counters provided by HotSpot.
HotSpotDiagnosticMXBean  is part of Management API: 
http://docs.oracle.com/javase/7/docs/api/java/lang/management/package-summary.html
There is a factory: ManagementFactory
and we can instantiate a bean using this ManagementFactory.

I think, it would be a good start to:
1) implement use of default counters provided by FlightRecorderMXBean
2) if it is not enough, later on we can implement some custom performance 
counters

> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-10 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Attachment: HBASE-14422.master.019.patch

-removed @Ignore annotation for testInterceptorIntercept50times(), actually I 
did not found any reason why this test was ignored, seems that it works fine 
and does not use latching.
- fix test testPreemptiveFastFailException50Times() by removing if (pffe) 
condition before incrementing counter, so now counter gets incremented by 
Thread-2 anyway, test is not failing anymore.
- remove white-spaces after EOL


> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, 
> HBASE-14422.master.014.patch, HBASE-14422.master.015.patch, 
> HBASE-14422.master.016.patch, HBASE-14422.master.017.patch, 
> HBASE-14422.master.018.patch, HBASE-14422.master.019.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-10 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Open  (was: Patch Available)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, 
> HBASE-14422.master.014.patch, HBASE-14422.master.015.patch, 
> HBASE-14422.master.016.patch, HBASE-14422.master.017.patch, 
> HBASE-14422.master.018.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-10 Thread Konstantin Ryakhovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369593#comment-15369593
 ] 

Konstantin Ryakhovskiy commented on HBASE-16142:


There is even another option, there is a bean: HotSpotDiagnosticMXBean and we 
can enable FlightRecorder JVM option on the fly using the code:
{code}
HotSpotDiagnostic bean = new HotSpotDiagnostic();
bean.setVMOption("FlightRecorder", "true");
{code}

Unfortunately, the option UnlockCommercialFeatures is not writeable, so that we 
should start JVM with -XX:+UnlockCommercialFeatures.
Afterwards, we can enable (-XX:+FlightRecorder) and disable 
(-XX:-FlightRecorder) JVM option on the fly.

[~stack], can you please tell me which exactly shell script you want to modify?
Does it make sense to modify hbase-env.sh and add -XX:+UnlockCommercialFeatures 
for HBASE_MASTER_OPTS and HBASE_REGIONSERVER_OPTS variables?

another question, do we still need a blog-post about this?



> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Priority: Minor
>  Labels: beginner
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-10 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy reassigned HBASE-16142:
--

Assignee: Konstantin Ryakhovskiy

> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-10 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Patch Available  (was: Open)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, 
> HBASE-14422.master.014.patch, HBASE-14422.master.015.patch, 
> HBASE-14422.master.016.patch, HBASE-14422.master.017.patch, 
> HBASE-14422.master.018.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-10 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Attachment: HBASE-14422.master.018.patch

- removed @Ignore annotation for testInterceptorIntercept50times(), actually I 
did not found any reason why this test was ignored, seems that it works fine 
and does not use latching.
- fix test testPreemptiveFastFailException50Times() by removing if (pffe) 
condition  before incrementing counter, so now counter gets incremented by 
Thread-2 anyway, test is not failing anymore.


> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, 
> HBASE-14422.master.014.patch, HBASE-14422.master.015.patch, 
> HBASE-14422.master.016.patch, HBASE-14422.master.017.patch, 
> HBASE-14422.master.018.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-10 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Open  (was: Patch Available)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, 
> HBASE-14422.master.014.patch, HBASE-14422.master.015.patch, 
> HBASE-14422.master.016.patch, HBASE-14422.master.017.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16145) MultiRowRangeFilter constructor shouldn't throw IOException

2016-07-05 Thread Konstantin Ryakhovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363228#comment-15363228
 ] 

Konstantin Ryakhovskiy commented on HBASE-16145:


Failed test TestHRegionWithInMemoryFlush does not use functionality which is 
patched, it is not related to the patch

> MultiRowRangeFilter constructor shouldn't throw IOException
> ---
>
> Key: HBASE-16145
> URL: https://issues.apache.org/jira/browse/HBASE-16145
> Project: HBase
>  Issue Type: Wish
>Reporter: Konstantin Ryakhovskiy
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
> Attachments: HBASE-16145.master.001.patch, 
> HBASE-16145.master.002.patch
>
>
> MultiRowRangeFilter constructor declares IOException.
> The constructor:
> - sorts and merges incoming arguments - list of ranges, 
> - assigns sorted list to a private variable and does not do anything else.
> There is no reason to declare IOException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16145) MultiRowRangeFilter constructor shouldn't throw IOException

2016-07-05 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-16145:
---
Status: Patch Available  (was: Open)

> MultiRowRangeFilter constructor shouldn't throw IOException
> ---
>
> Key: HBASE-16145
> URL: https://issues.apache.org/jira/browse/HBASE-16145
> Project: HBase
>  Issue Type: Wish
>Reporter: Konstantin Ryakhovskiy
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
> Attachments: HBASE-16145.master.001.patch, 
> HBASE-16145.master.002.patch
>
>
> MultiRowRangeFilter constructor declares IOException.
> The constructor:
> - sorts and merges incoming arguments - list of ranges, 
> - assigns sorted list to a private variable and does not do anything else.
> There is no reason to declare IOException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16145) MultiRowRangeFilter constructor shouldn't throw IOException

2016-07-05 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-16145:
---
Attachment: HBASE-16145.master.002.patch

submitting new patch

> MultiRowRangeFilter constructor shouldn't throw IOException
> ---
>
> Key: HBASE-16145
> URL: https://issues.apache.org/jira/browse/HBASE-16145
> Project: HBase
>  Issue Type: Wish
>Reporter: Konstantin Ryakhovskiy
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
> Attachments: HBASE-16145.master.001.patch, 
> HBASE-16145.master.002.patch
>
>
> MultiRowRangeFilter constructor declares IOException.
> The constructor:
> - sorts and merges incoming arguments - list of ranges, 
> - assigns sorted list to a private variable and does not do anything else.
> There is no reason to declare IOException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16145) MultiRowRangeFilter constructor shouldn't throw IOException

2016-07-05 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-16145:
---
Status: Open  (was: Patch Available)

> MultiRowRangeFilter constructor shouldn't throw IOException
> ---
>
> Key: HBASE-16145
> URL: https://issues.apache.org/jira/browse/HBASE-16145
> Project: HBase
>  Issue Type: Wish
>Reporter: Konstantin Ryakhovskiy
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
> Attachments: HBASE-16145.master.001.patch, 
> HBASE-16145.master.002.patch
>
>
> MultiRowRangeFilter constructor declares IOException.
> The constructor:
> - sorts and merges incoming arguments - list of ranges, 
> - assigns sorted list to a private variable and does not do anything else.
> There is no reason to declare IOException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-03 Thread Konstantin Ryakhovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360670#comment-15360670
 ] 

Konstantin Ryakhovskiy commented on HBASE-16142:


quick research on this topic, we can start/stop JFR programmatically using 
following code:
{code}
private FlightRecorderClient startJFR(String name) throws IOException, 
InstanceNotFoundException, NoSuchRecordingException {
FlightRecorder.registerWithPlatformMBeanServer();
FlightRecorderClient recorderClient = new FlightRecorderClient();
ObjectName recordingObj = recorderClient.createRecording(name);
recorderClient.enableDefaultRecording();
recorderClient.start(recordingObj);
return recorderClient;
}

private void stopAndDumpJFR(FlightRecorderClient recorderClient, ObjectName 
recordingObj, String outputPath) throws IOException, NoSuchRecordingException {
recorderClient.stop(recordingObj);
recorderClient.copyTo(recordingObj, outputPath);
}
{code}

is it what we need?

> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Priority: Minor
>  Labels: beginner
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-03 Thread Konstantin Ryakhovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360650#comment-15360650
 ] 

Konstantin Ryakhovskiy edited comment on HBASE-16142 at 7/3/16 7:46 PM:


as of my findings, the feature of custom events is not supported yet (see slide 
#32, Adding your own events):
http://www.oracle.com/technetwork/oem/soa-mgmt/con10912-javaflightrecorder-2342054.pdf
As of now, the API is not documented and marked as deprecated, means it might 
be changed in the future.
in the same document, the roadmap says that the dynamic enablement of flight 
recorder will be available in the future.
Should we anyway start with the blog-post, describe "how to", and prototype?



was (Author: ryakhovskiy.k):
as of my findings, the feature of custom events is not supported yet (see slide 
#32, Adding your own events):
http://www.oracle.com/technetwork/oem/soa-mgmt/con10912-javaflightrecorder-2342054.pdf
As of now, the API is not documented and marked as deprecated, means it might 
be changed in the future.
in the same document, the roadmap says that the flight recorder will be 
available in the future.
Should we anyway start with the blog-post, describe "how to", and prototype?


> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Priority: Minor
>  Labels: beginner
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16142) Trigger JFR session when under duress -- e.g. backed-up request queue count -- and dump the recording to log dir

2016-07-03 Thread Konstantin Ryakhovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360650#comment-15360650
 ] 

Konstantin Ryakhovskiy commented on HBASE-16142:


as of my findings, the feature of custom events is not supported yet (see slide 
#32, Adding your own events):
http://www.oracle.com/technetwork/oem/soa-mgmt/con10912-javaflightrecorder-2342054.pdf
As of now, the API is not documented and marked as deprecated, means it might 
be changed in the future.
in the same document, the roadmap says that the flight recorder will be 
available in the future.
Should we anyway start with the blog-post, describe "how to", and prototype?


> Trigger JFR session when under duress -- e.g. backed-up request queue count 
> -- and dump the recording to log dir
> 
>
> Key: HBASE-16142
> URL: https://issues.apache.org/jira/browse/HBASE-16142
> Project: HBase
>  Issue Type: Task
>  Components: Operability
>Reporter: stack
>Priority: Minor
>  Labels: beginner
>
> Chatting today w/ a mighty hbase operator on how to figure what is happening 
> during transitory latency spike or any other transitory 'weirdness' in a 
> server, the idea came up that a java flight recording during a spike would 
> include a pretty good picture of what is going on during the time of duress 
> (more ideal would be a trace of the explicit slow queries showing call stack 
> with timings dumped to a sink for later review; i.e. trigger an htrace when a 
> query is slow...).
> Taking a look, programmatically triggering a JFR recording seems doable, if 
> awkward (MBean invocations). There is even a means of specifying 'triggers' 
> based off any published mbean emission -- e.g. a query queue count threshold 
> -- which looks nice. See 
> https://community.oracle.com/thread/3676275?start=0=0 and 
> https://docs.oracle.com/javacomponents/jmc-5-4/jfr-runtime-guide/run.htm#JFRUH184
> This feature could start out as a blog post describing how to do it for one 
> server. A plugin on Canary that looks at mbean values and if over a 
> configured threshold, triggers a recording remotely could be next. Finally 
> could integrate a couple of triggers that fire when issue via the trigger 
> mechanism.
> Marking as beginner feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-03 Thread Konstantin Ryakhovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360593#comment-15360593
 ] 

Konstantin Ryakhovskiy commented on HBASE-14422:


[~stack] the issue reproduced with additional Thread.sleep(..) before 
latch.await():
I have added this Thread.sleep(..) to simulate bad enough hardware, like a long 
context switch.
the log: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2506/testReport/org.apache.hadoop.hbase.client/TestFastFailWithoutTestUtil/testPreemptiveFastFailException50Times/
at the iteration #8 (see line Time-limited test #7) Thread2 is in FastFail mode 
(TT-2 difference=1), it means, that when the code is in the method 
PreemptiveFastFailInterceptor#inFastFail(), then 
EnvironmentEdge.currentTimeMillis is 1ms greater than (time of the first 
failure + fast fail threshold).

To make the test more robust, we can increment done counter without 
verification, so, instead of line:
if (pffe) done.incrementAndGet();
we can write directly:
done.incrementAndGet();

will that work from your perspective?


> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, 
> HBASE-14422.master.014.patch, HBASE-14422.master.015.patch, 
> HBASE-14422.master.016.patch, HBASE-14422.master.017.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-03 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Attachment: HBASE-14422.master.017.patch

add Thread.sleep(FAST_FAIL_THRESHOLD) before latch.await() for thread-2 to 
simulate large context switch which might happen on bad hardware.


> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, 
> HBASE-14422.master.014.patch, HBASE-14422.master.015.patch, 
> HBASE-14422.master.016.patch, HBASE-14422.master.017.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-03 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Open  (was: Patch Available)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, 
> HBASE-14422.master.014.patch, HBASE-14422.master.015.patch, 
> HBASE-14422.master.016.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Patch Available  (was: Open)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, 
> HBASE-14422.master.014.patch, HBASE-14422.master.015.patch, 
> HBASE-14422.master.016.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Attachment: HBASE-14422.master.016.patch

one more attempt

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, 
> HBASE-14422.master.014.patch, HBASE-14422.master.015.patch, 
> HBASE-14422.master.016.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Open  (was: Patch Available)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, 
> HBASE-14422.master.014.patch, HBASE-14422.master.015.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Patch Available  (was: Open)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, 
> HBASE-14422.master.014.patch, HBASE-14422.master.015.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Attachment: HBASE-14422.master.015.patch

another try, small threshold

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, 
> HBASE-14422.master.014.patch, HBASE-14422.master.015.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Open  (was: Patch Available)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, 
> HBASE-14422.master.014.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Attachment: HBASE-14422.master.014.patch

another try

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, 
> HBASE-14422.master.014.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Patch Available  (was: Open)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, 
> HBASE-14422.master.014.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Open  (was: Patch Available)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Patch Available  (was: Open)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Attachment: HBASE-14422.master.013.patch

Now I am submitting another patch, where I've changed a fail-fast threshold to 
make a try to reproduce the issue easier

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, HBASE-14422.master.013.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Open  (was: Patch Available)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Attachment: HBASE-14422.master.012.patch

another try

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Patch Available  (was: Open)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, 
> HBASE-14422.master.012.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Open  (was: Patch Available)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, HBASE-14422.master.011.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Open  (was: Patch Available)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, 
> HBASE-14422.master.010.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Patch Available  (was: Open)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Attachment: HBASE-14422.master.009.patch

another try

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Open  (was: Patch Available)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, HBASE-14422.master.009.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Patch Available  (was: In Progress)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-14422 started by Konstantin Ryakhovskiy.
--
> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Open  (was: Patch Available)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-14422 started by Konstantin Ryakhovskiy.
--
> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work stopped] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-14422 stopped by Konstantin Ryakhovskiy.
--
> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-02 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Attachment: HBASE-14422.master.008.patch

another try

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, 
> HBASE-14422.master.008.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-01 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Patch Available  (was: Open)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-01 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Attachment: HBASE-14422.master.007.patch

another attempt, expecting a fail

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, HBASE-14422.master.007.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-01 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Open  (was: Patch Available)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-01 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Open  (was: Patch Available)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-01 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Patch Available  (was: Open)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-01 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Attachment: HBASE-14422.master.006.patch

Another try, hope never dies
It is wierd, but I can easily reproduce a bug on my rig

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.006.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16108) RowCounter should support multiple key ranges

2016-07-01 Thread Konstantin Ryakhovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359549#comment-15359549
 ] 

Konstantin Ryakhovskiy commented on HBASE-16108:


Findbugs warning is for HFileWriterV2 which is not touched by the patch, 
therefore it was introduced previously.
The failed test TestStochasticLoadBalancer2 is not related to the patch as 
well. I've run the test locally with the patch and it was successful.


> RowCounter should support multiple key ranges
> -
>
> Key: HBASE-16108
> URL: https://issues.apache.org/jira/browse/HBASE-16108
> Project: HBase
>  Issue Type: Improvement
>Reporter: Geoffrey Jacoby
>Assignee: Konstantin Ryakhovskiy
> Fix For: 2.0.0, 1.4.0
>
> Attachments: HBASE-16108.branch-1.001.patch, 
> HBASE-16108.master.001.patch, HBASE-16108.master.003.patch, 
> HBASE-16108.master.004.patch, test_HBASE-16108.log, 
> test_TestTableBasedReplicationSourceManagerImpl.log
>
>
> Currently, RowCounter only allows a single key range to be used as a filter. 
> It would be useful in some cases to be able to specify multiple key ranges 
> (or prefixes) in the same job. (For example, counting over a set of Phoenix 
> tenant ids in an unsalted table)
> This could be done by enhancing the existing key range parameter to take 
> multiple start/stop row pairs. Alternately, a new --row-prefixes option could 
> be added, similar to what HBASE-15847 did for VerifyReplication. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-01 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Attachment: (was: HBASE-14422.master.005.patch)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-01 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Attachment: HBASE-14422.master.005.patch

another attempt to submit and observe failed test.
[~stack], should I delete old patches, which were tested successfully?

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.005.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-01 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Patch Available  (was: Open)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, 
> HBASE-14422.master.005.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14422) Fix TestFastFailWithoutTestUtil

2016-07-01 Thread Konstantin Ryakhovskiy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Ryakhovskiy updated HBASE-14422:
---
Status: Open  (was: Patch Available)

> Fix TestFastFailWithoutTestUtil
> ---
>
> Key: HBASE-14422
> URL: https://issues.apache.org/jira/browse/HBASE-14422
> Project: HBase
>  Issue Type: Task
>  Components: test
>Reporter: stack
>Assignee: Konstantin Ryakhovskiy
>Priority: Minor
>  Labels: beginner
> Attachments: HBASE-14422.master.001.patch, 
> HBASE-14422.master.002.patch, HBASE-14422.master.003.patch, 
> HBASE-14422.master.004.patch, HBASE-14422.master.005.patch, log.txt, trace.log
>
>
> TestFastFailWithoutTestUtil has a unit test that does 
> testInterceptorIntercept50Times Usually it passes but on occasion, the 
> latching between thread 1 and thread 2 goes awry and the test hangs and the 
> test hangs out. Depends on the hardware but it seems to happen about one in 
> four runs here on an internal rig.
> HBASE-14421 changed the wait-on-latch to timeout and do a thread dump and 
> just let the test keep going.
> This issue is about digging in on figuring why the hang up on latches and 
> then fixing it so the test doesn't have to have the latch timeout. Hopefully 
> the threaddump helps.
> This one could be hard to fix since it not easy to reproduce. Marking it 
> beginner anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-16108) RowCounter should support multiple key ranges

2016-07-01 Thread Konstantin Ryakhovskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359062#comment-15359062
 ] 

Konstantin Ryakhovskiy edited comment on HBASE-16108 at 7/1/16 2:43 PM:


submitting patch for branch-1.
Here I modified tests little bit to make them more independent and avoid 
test-order:
one test was removing data from the test-table, while other tests were using 
data which was added on the setUp stage.



was (Author: ryakhovskiy.k):
submitting patch for branch-1.
Here I modified a tests little bit to make them more independent and avoid 
test-order (one test was removing data from the test-table).



> RowCounter should support multiple key ranges
> -
>
> Key: HBASE-16108
> URL: https://issues.apache.org/jira/browse/HBASE-16108
> Project: HBase
>  Issue Type: Improvement
>Reporter: Geoffrey Jacoby
>Assignee: Konstantin Ryakhovskiy
> Attachments: HBASE-16108.branch-1.001.patch, 
> HBASE-16108.master.001.patch, HBASE-16108.master.003.patch, 
> HBASE-16108.master.004.patch, test_HBASE-16108.log, 
> test_TestTableBasedReplicationSourceManagerImpl.log
>
>
> Currently, RowCounter only allows a single key range to be used as a filter. 
> It would be useful in some cases to be able to specify multiple key ranges 
> (or prefixes) in the same job. (For example, counting over a set of Phoenix 
> tenant ids in an unsalted table)
> This could be done by enhancing the existing key range parameter to take 
> multiple start/stop row pairs. Alternately, a new --row-prefixes option could 
> be added, similar to what HBASE-15847 did for VerifyReplication. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >