from:"Zheng Hu \(JIRA\)"

[jira] [Assigned] (HBASE-21946) Replace the byte[] pread by ByteBuffer pread in HFileBlock reading once HDFS-3246 prepared

2021-06-28 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-21946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu reassigned HBASE-21946:


Assignee: Wei-Chiu Chuang  (was: Zheng Hu)

> Replace the byte[] pread by ByteBuffer pread in HFileBlock reading once 
> HDFS-3246 prepared
> --
>
> Key: HBASE-21946
> URL: https://issues.apache.org/jira/browse/HBASE-21946
> Project: HBase
>  Issue Type: Improvement
>  Components: Offheaping
>Reporter: Zheng Hu
>Assignee: Wei-Chiu Chuang
>Priority: Critical
> Fix For: 2.5.0, 3.0.0-alpha-2
>
> Attachments: HBASE-21946.HBASE-21879.v01.patch, 
> HBASE-21946.HBASE-21879.v02.patch, HBASE-21946.HBASE-21879.v03.patch, 
> HBASE-21946.HBASE-21879.v04.patch
>
>
> [~stakiar] is working on HDFS-3246,  so now we have to keep the byte[] pread 
> in HFileBlock reading.  Once it get resolved, we can upgrade the hadoop 
> version and do the replacement. 
> I think it will be a great p999 latency improvement in 100% Get case, anyway 
> file a issue address this firstly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-21946) Replace the byte[] pread by ByteBuffer pread in HFileBlock reading once HDFS-3246 prepared

2021-06-28 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-21946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370564#comment-17370564
 ] 

Zheng Hu commented on HBASE-21946:
--

[~weichiu] Please just go ahead,  i would like to review your patch for this. 
Thanks.

> Replace the byte[] pread by ByteBuffer pread in HFileBlock reading once 
> HDFS-3246 prepared
> --
>
> Key: HBASE-21946
> URL: https://issues.apache.org/jira/browse/HBASE-21946
> Project: HBase
>  Issue Type: Improvement
>  Components: Offheaping
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Critical
> Fix For: 2.5.0, 3.0.0-alpha-2
>
> Attachments: HBASE-21946.HBASE-21879.v01.patch, 
> HBASE-21946.HBASE-21879.v02.patch, HBASE-21946.HBASE-21879.v03.patch, 
> HBASE-21946.HBASE-21879.v04.patch
>
>
> [~stakiar] is working on HDFS-3246,  so now we have to keep the byte[] pread 
> in HFileBlock reading.  Once it get resolved, we can upgrade the hadoop 
> version and do the replacement. 
> I think it will be a great p999 latency improvement in 100% Get case, anyway 
> file a issue address this firstly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-22504) Optimize the MultiByteBuff#get(ByteBuffer, offset, len)

2020-06-22 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142525#comment-17142525
 ] 

Zheng Hu commented on HBASE-22504:
--

[~ndimiduk], Well, when i write this patch , I checked there's no other class 
depending on the findCommonPrefix, while it's a public method. removing it will 
indeed introduce the compatibility issues. Let me restore it. Thanks.

> Optimize the MultiByteBuff#get(ByteBuffer, offset, len)
> ---
>
> Key: HBASE-22504
> URL: https://issues.apache.org/jira/browse/HBASE-22504
> Project: HBase
>  Issue Type: Sub-task
>  Components: BucketCache
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0
>
> Attachments: HBASE-22504.HBASE-21879.v01.patch
>
>
> In HBASE-22483,  we saw that the BucketCacheWriter thread was quite busy 
> [^BucketCacheWriter-is-busy.png],  the flame graph also indicated that the 
> ByteBufferArray#internalTransfer cost ~6% CPU (see 
> [async-prof-pid-25042-cpu-1.svg|https://issues.apache.org/jira/secure/attachment/12970294/async-prof-pid-25042-cpu-1.svg]).
>   because we used the hbase.ipc.server.allocator.buffer.size=64KB, each 
> HFileBlock will be backend  by a MultiByteBuff: one 64KB offheap ByteBuffer 
> and one small heap ByteBuffer.   
> The path is depending on the MultiByteBuff#get(ByteBuffer, offset, len) now: 
> {code:java}
> RAMQueueEntry#writeToCache
> |--> ByteBufferIOEngine#write
> |--> ByteBufferArray#internalTransfer
> |--> ByteBufferArray$WRITER
> |--> MultiByteBuff#get(ByteBuffer, offset, len)
> {code}
> While the MultiByteBuff#get impl is simple and crude now, can optimze this 
> implementation:
> {code:java}
>   @Override
>   public void get(ByteBuffer out, int sourceOffset,
>   int length) {
> checkRefCount();
>   // Not used from real read path actually. So not going with
>   // optimization
> for (int i = 0; i < length; ++i) {
>   out.put(this.get(sourceOffset + i));
> }
>   }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-24097) [Flakey Tests] TestSnapshotScannerHDFSAclController#testRestoreSnapshot

2020-03-31 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-24097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072378#comment-17072378
 ] 

Zheng Hu commented on HBASE-24097:
--

Ping [~meiyi]

> [Flakey Tests] TestSnapshotScannerHDFSAclController#testRestoreSnapshot
> ---
>
> Key: HBASE-24097
> URL: https://issues.apache.org/jira/browse/HBASE-24097
> Project: HBase
>  Issue Type: Bug
>  Components: flakies
>Reporter: Michael Stack
>Priority: Major
>
> Fails on occasion, 15% of the time according to flakie report. I can 
> reproduce it failing locally. A single method fails. I don't follow how it is 
> supposed to work (what looks wrong to me passes...). I noticed that if I ran 
> testRestoreSnapshot on its own, it passed but failed when run as part of the 
> test suite so I broke it out into its own suite. Now both old and new suites 
> pass for me locally after 20 repeats. Let me push it up. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HBASE-23656) [MERGETOOL] HBASE Support Merge region by pattern

2020-01-08 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-23656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu reassigned HBASE-23656:


Assignee: zhengsicheng

> [MERGETOOL] HBASE Support Merge region by pattern
> -
>
> Key: HBASE-23656
> URL: https://issues.apache.org/jira/browse/HBASE-23656
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Reporter: zhengsicheng
>Assignee: zhengsicheng
>Priority: Minor
> Fix For: 3.0.0
>
>
> Design Objective:
>  # Merge empty region
>  # Neat region
>  # merge expired region
> Usage: bin/hbase onlinemerge [--tableName=] [--startRegion=] [--stopRegion=] 
> [--maxRegionSize=] [--maxRegionCreateTime=] [--numMaxMergePlans=] 
> [--targetRegionCount=] [--printExecutionPlan=] [--configMergePauseTime=]
> Options:
>  --h or --h print help
>  --tableName table name must be not null
>  --startRegion start region
>  --stopRegion stop region
>  --maxRegionSize max region size Unit GB
>  --maxRegionCreateTime max Region Create Time /MM/dd HH:mm:ss
>  --numMaxMergePlans num MaxMerge Plans
>  --targetRegionCount target Region Count
>  --configMergePauseTime config Merge Pause Time In milliseconds
>  --printExecutionPlan Value default is true print execution plans false is 
> execution merge
> Examples:
>  bin/hbase onlinemerge --tableName=test:test1 
> --startRegion=test:test1,,1576835912332.01d0d6c2b41e204104524d9aec6074fb. 
> --stopRegion=test:test1,,1573044786980.0c9b5bd93f3b19eb9bd1a1011ddff66f.
>  --maxRegionSize=0 --maxRegionCreateTime=/MM/dd HH:mm:ss 
> --numMaxMergePlans=2 --targetRegionCount=4 --printExecutionPlan=false



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-23251) Add Column Family and Table Names to HFileContext and use in HFileWriterImpl logging

2019-11-11 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-23251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu resolved HBASE-23251.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

> Add Column Family and Table Names to HFileContext and use in HFileWriterImpl 
> logging
> 
>
> Key: HBASE-23251
> URL: https://issues.apache.org/jira/browse/HBASE-23251
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HBASE-23251.v01.patch
>
>
> When something goes wrong in the Store / HFile write path, it would be very 
> useful to know which column family and table the error is coming from. 
> Currently the HFileWriterImpl gets an HFileContext object with some useful 
> state information, but the column family and table aren't among them. 
> For example, this would be very helpful diagnosing HBASE-23143 and similar 
> issues. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-22480) Get block from BlockCache once and return this block to BlockCache twice make ref count error.

2019-11-07 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969730#comment-16969730
 ] 

Zheng Hu commented on HBASE-22480:
--

The UT failure is unrelated to the patch, +1 to commit the patch. [~binlijin]

> Get block from BlockCache once and return this block to BlockCache twice make 
> ref count error.
> --
>
> Key: HBASE-22480
> URL: https://issues.apache.org/jira/browse/HBASE-22480
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.2.2
>Reporter: Lijin Bin
>Assignee: Lijin Bin
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.3
>
> Attachments: HBASE-22480-branch-2.2-v1.patch, 
> HBASE-22480-branch-2.2-v1.patch, HBASE-22480-branch-2.2-v1.patch, 
> HBASE-22480-branch-2.2-v2.patch, HBASE-22480-master-v1.patch, 
> HBASE-22480-master-v2.patch, HBASE-22480-master-v3.patch, 
> HBASE-22480-master-v4.patch, HBASE-22480-master-v5.patch, 
> HBASE-22480-master-v6.patch, HBASE-22480-master-v6.patch, 
> HBASE-22480-master-v6.patch, HBASE-22480-master-v7.patch, 
> HBASE-22480-master-v7.patch
>
>
> After debugging HBASE-22433, i find the problem it is that we get a block 
> from BucketCache once and return this block to BucketCache twice and make the 
> ref count error, sometimes the refCount can be negative.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-23258) Reconsider the DBB memory leak in the exceptional paths.

2019-11-05 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-23258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968048#comment-16968048
 ] 

Zheng Hu commented on HBASE-23258:
--

Well, Thanks [~binlijin] for your feedback.  Let us link that issues and do the 
reviewing in that JIRA.

> Reconsider the DBB memory leak in the exceptional paths.
> 
>
> Key: HBASE-23258
> URL: https://issues.apache.org/jira/browse/HBASE-23258
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> Have a discussion with [~anoop.hbase],  we find still have some exceptional 
> paths which we did not handle DBB#release correctly. More details are the 
> following: 
> {code}
> > HFileReaderImpl#validateBlockType - When throws Exception, the get block is 
> > gone. Want to return?
> Yeah, we've two cases to validateBlockType. one is reading from cache, 
> another one is reading from hfile. 
> we've abstracted the block for both cache types and hfiles ( ex release the 
> block from heap will donothing), so
> here seems we should do release if throws IOException.
> > Only inside another condition we return and evict block
> readBlock - The got block is unused when we throw Exception on DBE type 
> mismatch
> Em..we return null finally , seems the block also need to release the block.
> {code}
> Will take a look around  all the exceptional paths and consider any other DBB 
> leak issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-23260) Table metrics no longer update once a new table is created

2019-11-05 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-23260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968032#comment-16968032
 ] 

Zheng Hu commented on HBASE-23260:
--

Mind to prepare a patch for this ? [~xinxin fan]. 

> Table metrics no longer update once a new table is created 
> ---
>
> Key: HBASE-23260
> URL: https://issues.apache.org/jira/browse/HBASE-23260
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 2.2.1
>Reporter: xinxin fan
>Priority: Major
>
> Once a new table is created, the thread  
> MetricsTableWrapperAggregateImpl$TableMetricsWrapperRunnable will no longer 
> work as a  java.util.NoSuchElementException is throwed. The details as 
> follows:
>  
> CLASS               
> org.apache.hadoop.hbase.regionserver.MetricsTableWrapperAggregateImpl$TableMetricsWrapperRunnable
>                                                        METHOD           run   
>                                                                               
>                                                                               
>                                                     RETURN             false  
>                                                                               
>                                                                               
>                                                     EXCEPTION     true        
>                                                                               
>                                                                               
>                                                  THROW-EXCEPTION  
> java.util.NoSuchElementException: No value present                            
>                                                                               
>                                                                          at 
> java.util.OptionalLong.getAsLong(OptionalLong.java:118)                       
>                                                                               
>                                                             at 
> org.apache.hadoop.hbase.regionserver.MetricsTableWrapperAggregateImpl$TableMetricsWrapperRunnable.run(MetricsTableWrapperAggregateImpl.java:80)
>                                              at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)       
>                                                                               
>                                                     at 
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)              
>                                                                               
>                                                        at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>                                                    at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>                                                                at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>                                                                               
>                                          at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>                                                                               
>                                          at 
> java.lang.Thread.run(Thread.java:748)      



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-23258) Reconsider the DBB memory leak in the exceptional paths.

2019-11-05 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-23258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-23258:
-
Fix Version/s: 2.3.0
   3.0.0

> Reconsider the DBB memory leak in the exceptional paths.
> 
>
> Key: HBASE-23258
> URL: https://issues.apache.org/jira/browse/HBASE-23258
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> Have a discussion with [~anoop.hbase],  we find still have some exceptional 
> paths which we did not handle DBB#release correctly. More details are the 
> following: 
> {code}
> > HFileReaderImpl#validateBlockType - When throws Exception, the get block is 
> > gone. Want to return?
> Yeah, we've two cases to validateBlockType. one is reading from cache, 
> another one is reading from hfile. 
> we've abstracted the block for both cache types and hfiles ( ex release the 
> block from heap will donothing), so
> here seems we should do release if throws IOException.
> > Only inside another condition we return and evict block
> readBlock - The got block is unused when we throw Exception on DBE type 
> mismatch
> Em..we return null finally , seems the block also need to release the block.
> {code}
> Will take a look around  all the exceptional paths and consider any other DBB 
> leak issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-23258) Reconsider the DBB memory leak in the exceptional paths.

2019-11-05 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-23258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-23258:
-
Issue Type: Bug  (was: Improvement)

> Reconsider the DBB memory leak in the exceptional paths.
> 
>
> Key: HBASE-23258
> URL: https://issues.apache.org/jira/browse/HBASE-23258
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
>
> Have a discussion with [~anoop.hbase],  we find still have some exceptional 
> paths which we did not handle DBB#release correctly. More details are the 
> following: 
> {code}
> > HFileReaderImpl#validateBlockType - When throws Exception, the get block is 
> > gone. Want to return?
> Yeah, we've two cases to validateBlockType. one is reading from cache, 
> another one is reading from hfile. 
> we've abstracted the block for both cache types and hfiles ( ex release the 
> block from heap will donothing), so
> here seems we should do release if throws IOException.
> > Only inside another condition we return and evict block
> readBlock - The got block is unused when we throw Exception on DBE type 
> mismatch
> Em..we return null finally , seems the block also need to release the block.
> {code}
> Will take a look around  all the exceptional paths and consider any other DBB 
> leak issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-23258) Reconsider the DBB memory leak in the exceptional paths.

2019-11-05 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-23258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-23258:
-
Description: 
Have a discussion with [~anoop.hbase],  we find still have some exceptional 
paths which we did not handle DBB#release correctly. More details are the 
following: 

{code}
> HFileReaderImpl#validateBlockType - When throws Exception, the get block is 
> gone. Want to return?
Yeah, we've two cases to validateBlockType. one is reading from cache, another 
one is reading from hfile. 
we've abstracted the block for both cache types and hfiles ( ex release the 
block from heap will donothing), so
here seems we should do release if throws IOException.

> Only inside another condition we return and evict block
readBlock - The got block is unused when we throw Exception on DBE type mismatch
Em..we return null finally , seems the block also need to release the block.
{code}

Will take a look around  all the exceptional paths and consider any other DBB 
leak issues.


  was:
Have a discussion with [~anoop.hbase],  we found still have some exceptional 
paths which we did not handle DBB#release correctly. More details are the 
following: 

{code}
> HFileReaderImpl#validateBlockType - When throws Exception, the get block is 
> gone. Want to return?
Yeah, we've two cases to validateBlockType. one is reading from cache, another 
one is reading from hfile. 
we've abstracted the block for both cache types and hfiles ( ex release the 
block from heap will donothing), so
here seems we should do release if throws IOException.

> Only inside another condition we return and evict block
readBlock - The got block is unused when we throw Exception on DBE type mismatch
Em..we return null finally , seems the block also need to release the block.
{code}

Will take a look around  all the exceptional paths and consider any other DBB 
leak issues.



> Reconsider the DBB memory leak in the exceptional paths.
> 
>
> Key: HBASE-23258
> URL: https://issues.apache.org/jira/browse/HBASE-23258
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
>
> Have a discussion with [~anoop.hbase],  we find still have some exceptional 
> paths which we did not handle DBB#release correctly. More details are the 
> following: 
> {code}
> > HFileReaderImpl#validateBlockType - When throws Exception, the get block is 
> > gone. Want to return?
> Yeah, we've two cases to validateBlockType. one is reading from cache, 
> another one is reading from hfile. 
> we've abstracted the block for both cache types and hfiles ( ex release the 
> block from heap will donothing), so
> here seems we should do release if throws IOException.
> > Only inside another condition we return and evict block
> readBlock - The got block is unused when we throw Exception on DBE type 
> mismatch
> Em..we return null finally , seems the block also need to release the block.
> {code}
> Will take a look around  all the exceptional paths and consider any other DBB 
> leak issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HBASE-23258) Reconsider the DBB memory leak in the exceptional paths.

2019-11-05 Thread Zheng Hu (Jira)

Zheng Hu created HBASE-23258:


 Summary: Reconsider the DBB memory leak in the exceptional paths.
 Key: HBASE-23258
 URL: https://issues.apache.org/jira/browse/HBASE-23258
 Project: HBase
  Issue Type: Improvement
Reporter: Zheng Hu
Assignee: Zheng Hu


Have a discussion with [~anoop.hbase],  we found still have some exceptional 
paths which we did not handle DBB#release correctly. More details are the 
following: 

{code}
> HFileReaderImpl#validateBlockType - When throws Exception, the get block is 
> gone. Want to return?
Yeah, we've two cases to validateBlockType. one is reading from cache, another 
one is reading from hfile. 
we've abstracted the block for both cache types and hfiles ( ex release the 
block from heap will donothing), so
here seems we should do release if throws IOException.

> Only inside another condition we return and evict block
readBlock - The got block is unused when we throw Exception on DBE type mismatch
Em..we return null finally , seems the block also need to release the block.
{code}

Will take a look around  all the exceptional paths and consider any other DBB 
leak issues.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-23184) The HeapAllocation in WebUI is not accurate

2019-10-17 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-23184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954242#comment-16954242
 ] 

Zheng Hu commented on HBASE-23184:
--

Seems the design that allocating heap buffer from the static HEAP for 
ByteBuffAllocator instance is not a good idea  ( at least for the heap related 
metrics) ? FYI  [~anoop.hbase].
I think it will be better if we just allocate the heap buffer from 
ByteBuffAllocator instance ( rather than the static HEAP),  then when 
considering the heap allocated metrics,  we just think about the 
ByteBuffAllocator instance..
 

> The HeapAllocation in WebUI is not accurate
> ---
>
> Key: HBASE-23184
> URL: https://issues.apache.org/jira/browse/HBASE-23184
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Reporter: chenxu
>Priority: Minor
>
> HeapAllocation in WebUI is always 0, the same reason as HBASE-22663



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-23107) Avoid temp byte array creation when doing cacheDataOnWrite

2019-10-16 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-23107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu resolved HBASE-23107.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

Pushed to branch-2 & master,  Thanks [~javaman_chen] for contributing  , and 
thanks all for reviewing & feedback.

> Avoid temp byte array creation when doing cacheDataOnWrite
> --
>
> Key: HBASE-23107
> URL: https://issues.apache.org/jira/browse/HBASE-23107
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache, HFile
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
>  Labels: gc
> Fix For: 3.0.0, 2.3.0
>
> Attachments: flamegraph_after.svg, flamegraph_before.svg
>
>
> code in HFileBlock.Writer.cloneUncompressedBufferWithHeader
> {code:java}
> ByteBuffer cloneUncompressedBufferWithHeader() {
>   expectState(State.BLOCK_READY);
>   byte[] uncompressedBlockBytesWithHeader = baosInMemory.toByteArray();
>   …
> }
> {code}
> When cacheOnWrite feature enabled, a temp byte array was created in order to 
> copy block’s data, we can avoid this by use of ByteBuffAllocator. This can 
> improve GC performance in write heavy scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-23107) Avoid temp byte array creation when doing cacheDataOnWrite

2019-10-16 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-23107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-23107:
-
Labels: gc  (was: )

> Avoid temp byte array creation when doing cacheDataOnWrite
> --
>
> Key: HBASE-23107
> URL: https://issues.apache.org/jira/browse/HBASE-23107
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache, HFile
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
>  Labels: gc
> Fix For: 3.0.0, 2.3.0
>
> Attachments: flamegraph_after.svg, flamegraph_before.svg
>
>
> code in HFileBlock.Writer.cloneUncompressedBufferWithHeader
> {code:java}
> ByteBuffer cloneUncompressedBufferWithHeader() {
>   expectState(State.BLOCK_READY);
>   byte[] uncompressedBlockBytesWithHeader = baosInMemory.toByteArray();
>   …
> }
> {code}
> When cacheOnWrite feature enabled, a temp byte array was created in order to 
> copy block’s data, we can avoid this by use of ByteBuffAllocator. This can 
> improve GC performance in write heavy scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-23107) Avoid temp byte array creation when doing cacheDataOnWrite

2019-10-16 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-23107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-23107:
-
Fix Version/s: 2.3.0
   3.0.0

> Avoid temp byte array creation when doing cacheDataOnWrite
> --
>
> Key: HBASE-23107
> URL: https://issues.apache.org/jira/browse/HBASE-23107
> Project: HBase
>  Issue Type: Improvement
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
> Attachments: flamegraph_after.svg, flamegraph_before.svg
>
>
> code in HFileBlock.Writer.cloneUncompressedBufferWithHeader
> {code:java}
> ByteBuffer cloneUncompressedBufferWithHeader() {
>   expectState(State.BLOCK_READY);
>   byte[] uncompressedBlockBytesWithHeader = baosInMemory.toByteArray();
>   …
> }
> {code}
> When cacheOnWrite feature enabled, a temp byte array was created in order to 
> copy block’s data, we can avoid this by use of ByteBuffAllocator. This can 
> improve GC performance in write heavy scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-23107) Avoid temp byte array creation when doing cacheDataOnWrite

2019-10-16 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-23107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-23107:
-
Component/s: BlockCache

> Avoid temp byte array creation when doing cacheDataOnWrite
> --
>
> Key: HBASE-23107
> URL: https://issues.apache.org/jira/browse/HBASE-23107
> Project: HBase
>  Issue Type: Improvement
>  Components: BlockCache, HFile
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
> Attachments: flamegraph_after.svg, flamegraph_before.svg
>
>
> code in HFileBlock.Writer.cloneUncompressedBufferWithHeader
> {code:java}
> ByteBuffer cloneUncompressedBufferWithHeader() {
>   expectState(State.BLOCK_READY);
>   byte[] uncompressedBlockBytesWithHeader = baosInMemory.toByteArray();
>   …
> }
> {code}
> When cacheOnWrite feature enabled, a temp byte array was created in order to 
> copy block’s data, we can avoid this by use of ByteBuffAllocator. This can 
> improve GC performance in write heavy scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-23107) Avoid temp byte array creation when doing cacheDataOnWrite

2019-10-16 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-23107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-23107:
-
Component/s: HFile

> Avoid temp byte array creation when doing cacheDataOnWrite
> --
>
> Key: HBASE-23107
> URL: https://issues.apache.org/jira/browse/HBASE-23107
> Project: HBase
>  Issue Type: Improvement
>  Components: HFile
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
> Attachments: flamegraph_after.svg, flamegraph_before.svg
>
>
> code in HFileBlock.Writer.cloneUncompressedBufferWithHeader
> {code:java}
> ByteBuffer cloneUncompressedBufferWithHeader() {
>   expectState(State.BLOCK_READY);
>   byte[] uncompressedBlockBytesWithHeader = baosInMemory.toByteArray();
>   …
> }
> {code}
> When cacheOnWrite feature enabled, a temp byte array was created in order to 
> copy block’s data, we can avoid this by use of ByteBuffAllocator. This can 
> improve GC performance in write heavy scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-23107) Avoid temp byte array creation when doing cacheDataOnWrite

2019-10-13 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-23107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16950657#comment-16950657
 ] 

Zheng Hu commented on HBASE-23107:
--

[~javaman_chen],  Did you mind to provide some perf data to show the GC 
pressure decrement ?   Maybe the async-profiler heap flamegraph can be the 
proof ...  Will be glad to see that if have one . 
Thanks.

> Avoid temp byte array creation when doing cacheDataOnWrite
> --
>
> Key: HBASE-23107
> URL: https://issues.apache.org/jira/browse/HBASE-23107
> Project: HBase
>  Issue Type: Improvement
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
>
> code in HFileBlock.Writer.cloneUncompressedBufferWithHeader
> {code:java}
> ByteBuffer cloneUncompressedBufferWithHeader() {
>   expectState(State.BLOCK_READY);
>   byte[] uncompressedBlockBytesWithHeader = baosInMemory.toByteArray();
>   …
> }
> {code}
> When cacheOnWrite feature enabled, a temp byte array was created in order to 
> copy block’s data, we can avoid this by use of ByteBuffAllocator. This can 
> improve GC performance in write heavy scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-23107) Avoid temp byte array creation when doing cacheDataOnWrite

2019-10-11 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-23107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949868#comment-16949868
 ] 

Zheng Hu commented on HBASE-23107:
--

Left few comments , please take a look. [~javaman_chen]

> Avoid temp byte array creation when doing cacheDataOnWrite
> --
>
> Key: HBASE-23107
> URL: https://issues.apache.org/jira/browse/HBASE-23107
> Project: HBase
>  Issue Type: Improvement
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
>
> code in HFileBlock.Writer.cloneUncompressedBufferWithHeader
> {code:java}
> ByteBuffer cloneUncompressedBufferWithHeader() {
>   expectState(State.BLOCK_READY);
>   byte[] uncompressedBlockBytesWithHeader = baosInMemory.toByteArray();
>   …
> }
> {code}
> When cacheOnWrite feature enabled, a temp byte array was created in order to 
> copy block’s data, we can avoid this by use of ByteBuffAllocator. This can 
> improve GC performance in write heavy scenarios.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-23140) Remove unknown table error

2019-10-09 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-23140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947460#comment-16947460
 ] 

Zheng Hu commented on HBASE-23140:
--

This bug was introduced by HBASE-21689 and HBASE-21689 was not committed to 
2.1, so no need to commit this patch in 2.1.x.  

Also I think I did not describe the commit status clearly... changed the fixed 
version & comment ...

> Remove unknown table error
> --
>
> Key: HBASE-23140
> URL: https://issues.apache.org/jira/browse/HBASE-23140
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Minor
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> "hbase:quota" will be created automatically when hbase.quota.enabled set to 
> true but If this feature is disabled then should not throw unknown table 
> error. 
> {code:java}
> hbase(main):025:0>
> describe_namespace 'hbase'
> DESCRIPTION
> {NAME => 'hbase'}
> QUOTAS
> ERROR: Unknown table
> hbase:quota!
> For usage try 'help
> "describe_namespace"'
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-23140) Remove unknown table error

2019-10-09 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-23140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-23140:
-
Fix Version/s: 2.3.0

> Remove unknown table error
> --
>
> Key: HBASE-23140
> URL: https://issues.apache.org/jira/browse/HBASE-23140
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Minor
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> "hbase:quota" will be created automatically when hbase.quota.enabled set to 
> true but If this feature is disabled then should not throw unknown table 
> error. 
> {code:java}
> hbase(main):025:0>
> describe_namespace 'hbase'
> DESCRIPTION
> {NAME => 'hbase'}
> QUOTAS
> ERROR: Unknown table
> hbase:quota!
> For usage try 'help
> "describe_namespace"'
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HBASE-23140) Remove unknown table error

2019-10-09 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-23140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947437#comment-16947437
 ] 

Zheng Hu edited comment on HBASE-23140 at 10/9/19 8:57 AM:
---

Pushed to branch-2.2 & branch-2 & master,  Thanks [~kpalanisamy] for 
contribution.


was (Author: openinx):
Pushed to branch-2.2 & master,  Thanks [~kpalanisamy] for contribution.

> Remove unknown table error
> --
>
> Key: HBASE-23140
> URL: https://issues.apache.org/jira/browse/HBASE-23140
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Minor
> Fix For: 3.0.0, 2.3.0, 2.2.2
>
>
> "hbase:quota" will be created automatically when hbase.quota.enabled set to 
> true but If this feature is disabled then should not throw unknown table 
> error. 
> {code:java}
> hbase(main):025:0>
> describe_namespace 'hbase'
> DESCRIPTION
> {NAME => 'hbase'}
> QUOTAS
> ERROR: Unknown table
> hbase:quota!
> For usage try 'help
> "describe_namespace"'
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-23140) Remove unknown table error

2019-10-09 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-23140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947437#comment-16947437
 ] 

Zheng Hu commented on HBASE-23140:
--

Pushed to branch-2.2 & master,  Thanks [~kpalanisamy] for contribution.

> Remove unknown table error
> --
>
> Key: HBASE-23140
> URL: https://issues.apache.org/jira/browse/HBASE-23140
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Minor
> Fix For: 3.0.0, 2.2.2
>
>
> "hbase:quota" will be created automatically when hbase.quota.enabled set to 
> true but If this feature is disabled then should not throw unknown table 
> error. 
> {code:java}
> hbase(main):025:0>
> describe_namespace 'hbase'
> DESCRIPTION
> {NAME => 'hbase'}
> QUOTAS
> ERROR: Unknown table
> hbase:quota!
> For usage try 'help
> "describe_namespace"'
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-23140) Remove unknown table error

2019-10-09 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-23140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu resolved HBASE-23140.
--
Fix Version/s: 2.2.2
   3.0.0
   Resolution: Fixed

> Remove unknown table error
> --
>
> Key: HBASE-23140
> URL: https://issues.apache.org/jira/browse/HBASE-23140
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Minor
> Fix For: 3.0.0, 2.2.2
>
>
> "hbase:quota" will be created automatically when hbase.quota.enabled set to 
> true but If this feature is disabled then should not throw unknown table 
> error. 
> {code:java}
> hbase(main):025:0>
> describe_namespace 'hbase'
> DESCRIPTION
> {NAME => 'hbase'}
> QUOTAS
> ERROR: Unknown table
> hbase:quota!
> For usage try 'help
> "describe_namespace"'
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-23135) NoSuchMethodError: org.apache.hadoop.hbase.CellComparator.getInstance() while trying to bulk load in hbase using spark

2019-10-08 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-23135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu resolved HBASE-23135.
--
Resolution: Not A Problem

> NoSuchMethodError: org.apache.hadoop.hbase.CellComparator.getInstance() while 
> trying to bulk load in hbase using spark
> --
>
> Key: HBASE-23135
> URL: https://issues.apache.org/jira/browse/HBASE-23135
> Project: HBase
>  Issue Type: Bug
>Reporter: Bikkumala Karthik
>Priority: Major
>
> I am trying to Bulk Load data from HDFS to HBase. I used the following 
> example 
> [https://github.com/apache/hbase-connectors/blob/master/spark/hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseBulkLoadExample.java]
>  
> I built the module with following command 
> mvn -Dspark.version=2.4.3 -Dscala.version=2.11.7 -Dscala.binary.version=2.11 
> clean install
> when i tried to run the example using the spark-submit command, i am getting 
> the following error: 
> {quote}Caused by: java.lang.NoSuchMethodError: 
> org.apache.hadoop.hbase.CellComparator.getInstance()Lorg/apache/hadoop/hbase/CellComparator;
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter$Builder.(StoreFileWriter.java:348)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext.org$apache$hadoop$hbase$spark$HBaseContext$$getNewHFileWriter(HBaseContext.scala:928)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$2.apply(HBaseContext.scala:1023)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$2.apply(HBaseContext.scala:972)
> at scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:79)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext.org$apache$hadoop$hbase$spark$HBaseContext$$writeValueToHFile(HBaseContext.scala:972)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$bulkLoad$3$$anonfun$apply$7.apply(HBaseContext.scala:677)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$bulkLoad$3$$anonfun$apply$7.apply(HBaseContext.scala:675)
> at scala.collection.Iterator$class.foreach(Iterator.scala:891)
> at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$bulkLoad$3.apply(HBaseContext.scala:675)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$bulkLoad$3.apply(HBaseContext.scala:664)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext.org$apache$hadoop$hbase$spark$HBaseContext$$hbaseForeachPartition(HBaseContext.scala:490)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$foreachPartition$1.apply(HBaseContext.scala:106)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$foreachPartition$1.apply(HBaseContext.scala:106)
> at 
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
> at 
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> at org.apache.spark.scheduler.Task.run(Task.scala:121)
> at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {quote}
>  
> Please find the code here (pom.xml, mvn dependency tree, source file): 
> [https://gist.github.com/bikkumala/d2e349c7bfaffc673e8a641ff3ec9d33]
> I tried with the following versions
> Spark : 2.4.x
> HBase : 2.0.x
> Hadoop : 2.7.x
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-23135) NoSuchMethodError: org.apache.hadoop.hbase.CellComparator.getInstance() while trying to bulk load in hbase using spark

2019-10-08 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-23135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946847#comment-16946847
 ] 

Zheng Hu commented on HBASE-23135:
--

I've checked the code and found that the CellComparator.java in branch-2.0 has 
the getInstance method while the branch-1.x has not,  so I guess there're some 
hbase-common jar conflicts problem in your  project, say one hbase-common with 
version 1.x and one with version 2.x, Please have a check.

It's not a bug, so please don't file a JIRA when have a question, please send a 
email to user mail list instead :-) .  I'll close this issue as Not a Problem.

> NoSuchMethodError: org.apache.hadoop.hbase.CellComparator.getInstance() while 
> trying to bulk load in hbase using spark
> --
>
> Key: HBASE-23135
> URL: https://issues.apache.org/jira/browse/HBASE-23135
> Project: HBase
>  Issue Type: Bug
>Reporter: Bikkumala Karthik
>Priority: Major
>
> I am trying to Bulk Load data from HDFS to HBase. I used the following 
> example 
> [https://github.com/apache/hbase-connectors/blob/master/spark/hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseBulkLoadExample.java]
>  
> I built the module with following command 
> mvn -Dspark.version=2.4.3 -Dscala.version=2.11.7 -Dscala.binary.version=2.11 
> clean install
> when i tried to run the example using the spark-submit command, i am getting 
> the following error: 
> {quote}Caused by: java.lang.NoSuchMethodError: 
> org.apache.hadoop.hbase.CellComparator.getInstance()Lorg/apache/hadoop/hbase/CellComparator;
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter$Builder.(StoreFileWriter.java:348)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext.org$apache$hadoop$hbase$spark$HBaseContext$$getNewHFileWriter(HBaseContext.scala:928)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$2.apply(HBaseContext.scala:1023)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$2.apply(HBaseContext.scala:972)
> at scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:79)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext.org$apache$hadoop$hbase$spark$HBaseContext$$writeValueToHFile(HBaseContext.scala:972)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$bulkLoad$3$$anonfun$apply$7.apply(HBaseContext.scala:677)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$bulkLoad$3$$anonfun$apply$7.apply(HBaseContext.scala:675)
> at scala.collection.Iterator$class.foreach(Iterator.scala:891)
> at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$bulkLoad$3.apply(HBaseContext.scala:675)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$bulkLoad$3.apply(HBaseContext.scala:664)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext.org$apache$hadoop$hbase$spark$HBaseContext$$hbaseForeachPartition(HBaseContext.scala:490)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$foreachPartition$1.apply(HBaseContext.scala:106)
> at 
> org.apache.hadoop.hbase.spark.HBaseContext$$anonfun$foreachPartition$1.apply(HBaseContext.scala:106)
> at 
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
> at 
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> at org.apache.spark.scheduler.Task.run(Task.scala:121)
> at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {quote}
>  
> Please find the code here (pom.xml, mvn dependency tree, source file): 
> [https://gist.github.com/bikkumala/d2e349c7bfaffc673e8a641ff3ec9d33]
> I tried with the following versions
> Spark : 2.4.x
> HBase : 2.0.x
> Hadoop : 2.7.x
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-22903) alter_status command is broken

2019-10-07 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946382#comment-16946382
 ] 

Zheng Hu commented on HBASE-22903:
--

Committed to all branch-2.x, Thanks [~vjasani] for contributing.

> alter_status command is broken
> --
>
> Key: HBASE-22903
> URL: https://issues.apache.org/jira/browse/HBASE-22903
> Project: HBase
>  Issue Type: Bug
>  Components: metrics, shell
>Affects Versions: 3.0.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.1.7, 2.2.2
>
> Attachments: HBASE-22903.branch-2.000.patch, 
> HBASE-22903.branch-2.1.000.patch
>
>
> This is applicable to master branch only:
> {code:java}
> > alter_status 't1'
> ERROR: undefined method `getAlterStatus' for 
> #
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-22903) alter_status command is broken

2019-10-07 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22903:
-
Hadoop Flags: Reviewed
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

> alter_status command is broken
> --
>
> Key: HBASE-22903
> URL: https://issues.apache.org/jira/browse/HBASE-22903
> Project: HBase
>  Issue Type: Bug
>  Components: metrics, shell
>Affects Versions: 3.0.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.1.7, 2.2.2
>
> Attachments: HBASE-22903.branch-2.000.patch, 
> HBASE-22903.branch-2.1.000.patch
>
>
> This is applicable to master branch only:
> {code:java}
> > alter_status 't1'
> ERROR: undefined method `getAlterStatus' for 
> #
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-22903) alter_status command is broken

2019-10-07 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22903:
-
Fix Version/s: 2.2.2
   2.1.7
   2.3.0

> alter_status command is broken
> --
>
> Key: HBASE-22903
> URL: https://issues.apache.org/jira/browse/HBASE-22903
> Project: HBase
>  Issue Type: Bug
>  Components: metrics, shell
>Affects Versions: 3.0.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.1.7, 2.2.2
>
> Attachments: HBASE-22903.branch-2.000.patch, 
> HBASE-22903.branch-2.1.000.patch
>
>
> This is applicable to master branch only:
> {code:java}
> > alter_status 't1'
> ERROR: undefined method `getAlterStatus' for 
> #
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-22903) alter_status command is broken

2019-10-07 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946378#comment-16946378
 ] 

Zheng Hu commented on HBASE-22903:
--

Sorry about the delay,  we have a long vacation in China since Oct 1. Fine, Let 
me got this resovled. Thanks [~vjasani]. 

> alter_status command is broken
> --
>
> Key: HBASE-22903
> URL: https://issues.apache.org/jira/browse/HBASE-22903
> Project: HBase
>  Issue Type: Bug
>  Components: metrics, shell
>Affects Versions: 3.0.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-22903.branch-2.000.patch, 
> HBASE-22903.branch-2.1.000.patch
>
>
> This is applicable to master branch only:
> {code:java}
> > alter_status 't1'
> ERROR: undefined method `getAlterStatus' for 
> #
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-22903) alter_status command is broken

2019-09-30 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940794#comment-16940794
 ] 

Zheng Hu commented on HBASE-22903:
--

[~vjasani], Mind check the branch-2 & branch-2.1 & branch-2.2 ?   seems the 
patch for master can not be applied to the branch-2.x now , Thanks.

> alter_status command is broken
> --
>
> Key: HBASE-22903
> URL: https://issues.apache.org/jira/browse/HBASE-22903
> Project: HBase
>  Issue Type: Bug
>  Components: metrics, shell
>Affects Versions: 3.0.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-22903.master.000.patch, 
> HBASE-22903.master.001.patch, HBASE-22903.master.002.patch, 
> HBASE-22903.master.005.patch, HBASE-22903.master.006.patch
>
>
> This is applicable to master branch only:
> {code:java}
> > alter_status 't1'
> ERROR: undefined method `getAlterStatus' for 
> #
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-22965) RS Crash due to DBE reference to an reused ByteBuff

2019-09-29 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940596#comment-16940596
 ] 

Zheng Hu commented on HBASE-22965:
--

Pushed to all 2.x branches, Thanks [~javaman_chen] for contributing. 

> RS Crash due to DBE reference to an reused ByteBuff
> ---
>
> Key: HBASE-22965
> URL: https://issues.apache.org/jira/browse/HBASE-22965
> Project: HBase
>  Issue Type: Bug
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.1.7, 2.2.2
>
> Attachments: hs_regionserver_err_pid.log
>
>
> After introduce HBASE-21879 into our own branch, when enable data block 
> encoding with ROW_INDEX_V1, RegionServer crashed (the crash log has been 
> uploaded).
> After reading RowIndexEncoderV1, find _lastCell_ may refer to an reused 
> ByteBuff, because DBE is not a listener of Shipper。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (HBASE-22965) RS Crash due to DBE reference to an reused ByteBuff

2019-09-29 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu resolved HBASE-22965.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

> RS Crash due to DBE reference to an reused ByteBuff
> ---
>
> Key: HBASE-22965
> URL: https://issues.apache.org/jira/browse/HBASE-22965
> Project: HBase
>  Issue Type: Bug
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.1.7, 2.2.2
>
> Attachments: hs_regionserver_err_pid.log
>
>
> After introduce HBASE-21879 into our own branch, when enable data block 
> encoding with ROW_INDEX_V1, RegionServer crashed (the crash log has been 
> uploaded).
> After reading RowIndexEncoderV1, find _lastCell_ may refer to an reused 
> ByteBuff, because DBE is not a listener of Shipper。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HBASE-22965) RS Crash due to DBE reference to an reused ByteBuff

2019-09-29 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22965:
-
Fix Version/s: 2.2.2
   2.1.7
   2.3.0
   3.0.0

> RS Crash due to DBE reference to an reused ByteBuff
> ---
>
> Key: HBASE-22965
> URL: https://issues.apache.org/jira/browse/HBASE-22965
> Project: HBase
>  Issue Type: Bug
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.1.7, 2.2.2
>
> Attachments: hs_regionserver_err_pid.log
>
>
> After introduce HBASE-21879 into our own branch, when enable data block 
> encoding with ROW_INDEX_V1, RegionServer crashed (the crash log has been 
> uploaded).
> After reading RowIndexEncoderV1, find _lastCell_ may refer to an reused 
> ByteBuff, because DBE is not a listener of Shipper。



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-23045) currentPath may be stitched in a loop in replication source code.

2019-09-29 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-23045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940361#comment-16940361
 ] 

Zheng Hu commented on HBASE-23045:
--

Yeah, mind provide a UT to reproduce this bug ?  will be appreciate for that.. 
btw, mind put the patch on github PR ? [~gk_coder]

>  currentPath may be stitched in a loop in replication source code.
> --
>
> Key: HBASE-23045
> URL: https://issues.apache.org/jira/browse/HBASE-23045
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 1.2.6.1
>Reporter: kangkang.guo
>Assignee: kangkang.guo
>Priority: Critical
> Fix For: 1.2.6.1
>
> Attachments: HBASE-23045.branch-1.2.0001.patch
>
>
> When the openReader encounters a FileNotFoundException, we may go to all 
> possible directories to find the current hlog. When found, the path may be 
> wrong, and it is looped together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (HBASE-23075) Upgrade jackson version

2019-09-25 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-23075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu reassigned HBASE-23075:


Assignee: Nicholas Jiang

> Upgrade jackson version
> ---
>
> Key: HBASE-23075
> URL: https://issues.apache.org/jira/browse/HBASE-23075
> Project: HBase
>  Issue Type: Improvement
>  Components: dependencies, REST
>Reporter: Nicholas Jiang
>Assignee: Nicholas Jiang
>Priority: Major
>
> A Polymorphic Typing issue was discovered in FasterXML jackson-databind 
> before 2.9.10. It is related to com.zaxxer.hikari.HikariDataSource. This is a 
> different vulnerability than CVE-2019-14540.
> https://nvd.nist.gov/vuln/detail/CVE-2019-16335
> A Polymorphic Typing issue was discovered in FasterXML jackson-databind 
> before 2.9.10. It is related to com.zaxxer.hikari.HikariConfig.
> https://nvd.nist.gov/vuln/detail/CVE-2019-14540



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-23009) TestSnapshotScannerHDFSAclController is broken on branch-2

2019-09-10 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-23009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927164#comment-16927164
 ] 

Zheng Hu commented on HBASE-23009:
--

Ping [~meiyi]

> TestSnapshotScannerHDFSAclController is broken on branch-2
> --
>
> Key: HBASE-23009
> URL: https://issues.apache.org/jira/browse/HBASE-23009
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 2.3.0
>Reporter: Peter Somogyi
>Priority: Major
> Fix For: 2.3.0
>
>
> TestSnapshotScannerHDFSAclController.testCleanArchiveTableDir always fails on 
> branch-2.
> {noformat}
> java.lang.AssertionError at 
> org.apache.hadoop.hbase.security.access.TestSnapshotScannerHDFSAclController.testCleanArchiveTableDir(TestSnapshotScannerHDFSAclController.java:745)
>  {noformat}
> Test run: 
> [https://builds.apache.org/job/HBase-Flaky-Tests/job/branch-2/4148/testReport/junit/org.apache.hadoop.hbase.security.access/TestSnapshotScannerHDFSAclController/testCleanArchiveTableDir/]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (HBASE-22965) RS Crash due to RowIndexEncoderV1 reference to an reused ByteBuff

2019-09-09 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925548#comment-16925548
 ] 

Zheng Hu commented on HBASE-22965:
--

OK,  it seems a bug, mind to prepare a patch for this ? 

> RS Crash due to RowIndexEncoderV1 reference to an reused ByteBuff
> -
>
> Key: HBASE-22965
> URL: https://issues.apache.org/jira/browse/HBASE-22965
> Project: HBase
>  Issue Type: Bug
>Reporter: chenxu
>Priority: Major
> Attachments: hs_regionserver_err_pid.log
>
>
> After introduce HBASE-21879 into our own branch, when enable data block 
> encoding with ROW_INDEX_V1, RegionServer crashed (the crash log has been 
> uploaded).
> After reading RowIndexEncoderV1, find _lastCell_ may refer to an reused 
> ByteBuff, because DBE is not a listener of Shipper。



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (HBASE-22802) Avoid temp ByteBuffer allocation in FileIOEngine#read

2019-09-09 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925533#comment-16925533
 ] 

Zheng Hu commented on HBASE-22802:
--

Pushed to branch-2 & master , Thanks [~javaman_chen] for contributing, please 
fill the release node about the feature, [~javaman_chen]. 

> Avoid temp ByteBuffer allocation in FileIOEngine#read
> -
>
> Key: HBASE-22802
> URL: https://issues.apache.org/jira/browse/HBASE-22802
> Project: HBase
>  Issue Type: Improvement
>  Components: BucketCache
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HBASE-22802-master-v1.patch, profile_mem_alloc.png, 
> profile_mem_alloc_with_pool.png
>
>
> a temp ByteBuffer was allocated each time FileIOEngine#read was called
> {code:java}
> public Cacheable read(BucketEntry be) throws IOException {
>   long offset = be.offset();
>   int length = be.getLength();
>   Preconditions.checkArgument(length >= 0, "Length of read can not be less 
> than 0.");
>   ByteBuffer dstBuffer = ByteBuffer.allocate(length);
>   ...
> }
> {code}
> we can avoid this by use of ByteBuffAllocator#allocate(length) after 
> HBASE-21879



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (HBASE-22802) Avoid temp ByteBuffer allocation in FileIOEngine#read

2019-09-09 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22802:
-
Hadoop Flags: Reviewed
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

> Avoid temp ByteBuffer allocation in FileIOEngine#read
> -
>
> Key: HBASE-22802
> URL: https://issues.apache.org/jira/browse/HBASE-22802
> Project: HBase
>  Issue Type: Improvement
>  Components: BucketCache
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HBASE-22802-master-v1.patch, profile_mem_alloc.png, 
> profile_mem_alloc_with_pool.png
>
>
> a temp ByteBuffer was allocated each time FileIOEngine#read was called
> {code:java}
> public Cacheable read(BucketEntry be) throws IOException {
>   long offset = be.offset();
>   int length = be.getLength();
>   Preconditions.checkArgument(length >= 0, "Length of read can not be less 
> than 0.");
>   ByteBuffer dstBuffer = ByteBuffer.allocate(length);
>   ...
> }
> {code}
> we can avoid this by use of ByteBuffAllocator#allocate(length) after 
> HBASE-21879



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (HBASE-22802) Avoid temp ByteBuffer allocation in FileIOEngine#read

2019-09-09 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22802:
-
Fix Version/s: 2.3.0
   3.0.0

> Avoid temp ByteBuffer allocation in FileIOEngine#read
> -
>
> Key: HBASE-22802
> URL: https://issues.apache.org/jira/browse/HBASE-22802
> Project: HBase
>  Issue Type: Improvement
>  Components: BucketCache
>Reporter: chenxu
>Assignee: chenxu
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HBASE-22802-master-v1.patch, profile_mem_alloc.png, 
> profile_mem_alloc_with_pool.png
>
>
> a temp ByteBuffer was allocated each time FileIOEngine#read was called
> {code:java}
> public Cacheable read(BucketEntry be) throws IOException {
>   long offset = be.offset();
>   int length = be.getLength();
>   Preconditions.checkArgument(length >= 0, "Length of read can not be less 
> than 0.");
>   ByteBuffer dstBuffer = ByteBuffer.allocate(length);
>   ...
> }
> {code}
> we can avoid this by use of ByteBuffAllocator#allocate(length) after 
> HBASE-21879



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Resolved] (HBASE-22995) The TestHRegionWithInMemoryFlush is broken

2019-09-09 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu resolved HBASE-22995.
--
Resolution: Duplicate

The https://github.com/apache/hbase/pull/589 is addressing this fix.

> The TestHRegionWithInMemoryFlush is broken
> --
>
> Key: HBASE-22995
> URL: https://issues.apache.org/jira/browse/HBASE-22995
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Priority: Major
> Attachments: HBASE-22995.v01.patch
>
>
> {code}
> 2019-09-09 15:40:58,764 INFO  [Time-limited test] regionserver.HRegion(1038): 
> Opened 9669fa00ab90e206bb967cd27169d0e5; next sequenceid=2
> 2019-09-09 15:40:58,771 INFO  [PutThread] regionserver.HRegion(8489): writing 
> data to region 
> testWritesWhileScanning,,1568014858732.9669fa00ab90e206bb967cd27169d0e5. with 
> WAL disabled. Data may be lost in the event of a crash.
> Exception in thread "PutThread" java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:334)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:193)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:115)
> at 
> org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:176)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:334)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.doAdd(AbstractMemStore.java:157)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.doAddOrUpsert(AbstractMemStore.java:147)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:117)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:111)
> at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:771)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.applyToMemStore(HRegion.java:4474)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.access$500(HRegion.java:228)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.applyFamilyMapToMemStore(HRegion.java:3533)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lambda$writeMiniBatchOperationsToMemStore$0(HRegion.java:3224)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.visitBatchOperations(HRegion.java:3157)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3216)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3698)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4112)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4045)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3976)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3967)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3981)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4308)
> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3100)
> at 
> org.apache.hadoop.hbase.regionserver.TestHRegion$PutThread.run(TestHRegion.java:3704)
> 2019-09-09 15:40:58,842 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,845 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,845 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,846 DEBUG

[jira] [Commented] (HBASE-22995) The TestHRegionWithInMemoryFlush is broken

2019-09-09 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925514#comment-16925514
 ] 

Zheng Hu commented on HBASE-22995:
--

OK, Let me mark it as closed, and get the HBASE-22979 patch in.

> The TestHRegionWithInMemoryFlush is broken
> --
>
> Key: HBASE-22995
> URL: https://issues.apache.org/jira/browse/HBASE-22995
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Priority: Major
> Attachments: HBASE-22995.v01.patch
>
>
> {code}
> 2019-09-09 15:40:58,764 INFO  [Time-limited test] regionserver.HRegion(1038): 
> Opened 9669fa00ab90e206bb967cd27169d0e5; next sequenceid=2
> 2019-09-09 15:40:58,771 INFO  [PutThread] regionserver.HRegion(8489): writing 
> data to region 
> testWritesWhileScanning,,1568014858732.9669fa00ab90e206bb967cd27169d0e5. with 
> WAL disabled. Data may be lost in the event of a crash.
> Exception in thread "PutThread" java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:334)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:193)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:115)
> at 
> org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:176)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:334)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.doAdd(AbstractMemStore.java:157)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.doAddOrUpsert(AbstractMemStore.java:147)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:117)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:111)
> at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:771)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.applyToMemStore(HRegion.java:4474)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.access$500(HRegion.java:228)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.applyFamilyMapToMemStore(HRegion.java:3533)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lambda$writeMiniBatchOperationsToMemStore$0(HRegion.java:3224)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.visitBatchOperations(HRegion.java:3157)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3216)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3698)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4112)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4045)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3976)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3967)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3981)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4308)
> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3100)
> at 
> org.apache.hadoop.hbase.regionserver.TestHRegion$PutThread.run(TestHRegion.java:3704)
> 2019-09-09 15:40:58,842 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,845 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,845 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,846

[jira] [Updated] (HBASE-22995) The TestHRegionWithInMemoryFlush is broken

2019-09-09 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22995:
-
Attachment: HBASE-22995.v01.patch

> The TestHRegionWithInMemoryFlush is broken
> --
>
> Key: HBASE-22995
> URL: https://issues.apache.org/jira/browse/HBASE-22995
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Priority: Major
> Attachments: HBASE-22995.v01.patch
>
>
> {code}
> 2019-09-09 15:40:58,764 INFO  [Time-limited test] regionserver.HRegion(1038): 
> Opened 9669fa00ab90e206bb967cd27169d0e5; next sequenceid=2
> 2019-09-09 15:40:58,771 INFO  [PutThread] regionserver.HRegion(8489): writing 
> data to region 
> testWritesWhileScanning,,1568014858732.9669fa00ab90e206bb967cd27169d0e5. with 
> WAL disabled. Data may be lost in the event of a crash.
> Exception in thread "PutThread" java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:334)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:193)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:115)
> at 
> org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:176)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:334)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.doAdd(AbstractMemStore.java:157)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.doAddOrUpsert(AbstractMemStore.java:147)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:117)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:111)
> at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:771)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.applyToMemStore(HRegion.java:4474)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.access$500(HRegion.java:228)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.applyFamilyMapToMemStore(HRegion.java:3533)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lambda$writeMiniBatchOperationsToMemStore$0(HRegion.java:3224)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.visitBatchOperations(HRegion.java:3157)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3216)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3698)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4112)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4045)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3976)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3967)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3981)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4308)
> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3100)
> at 
> org.apache.hadoop.hbase.regionserver.TestHRegion$PutThread.run(TestHRegion.java:3704)
> 2019-09-09 15:40:58,842 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,845 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,845 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
> files, 0 compacting, 0 eligible, 16 blocking
> 2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
> compactions.SortedCompactionPolicy(66):

[jira] [Updated] (HBASE-22995) The TestHRegionWithInMemoryFlush is broken

2019-09-09 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22995:
-
Description: 
{code}
2019-09-09 15:40:58,764 INFO  [Time-limited test] regionserver.HRegion(1038): 
Opened 9669fa00ab90e206bb967cd27169d0e5; next sequenceid=2
2019-09-09 15:40:58,771 INFO  [PutThread] regionserver.HRegion(8489): writing 
data to region 
testWritesWhileScanning,,1568014858732.9669fa00ab90e206bb967cd27169d0e5. with 
WAL disabled. Data may be lost in the event of a crash.
Exception in thread "PutThread" java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:334)
at 
org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:193)
at 
org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:115)
at 
org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:176)
at 
org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:334)
at 
org.apache.hadoop.hbase.regionserver.AbstractMemStore.doAdd(AbstractMemStore.java:157)
at 
org.apache.hadoop.hbase.regionserver.AbstractMemStore.doAddOrUpsert(AbstractMemStore.java:147)
at 
org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:117)
at 
org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:111)
at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:771)
at 
org.apache.hadoop.hbase.regionserver.HRegion.applyToMemStore(HRegion.java:4474)
at 
org.apache.hadoop.hbase.regionserver.HRegion.access$500(HRegion.java:228)
at 
org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.applyFamilyMapToMemStore(HRegion.java:3533)
at 
org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lambda$writeMiniBatchOperationsToMemStore$0(HRegion.java:3224)
at 
org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.visitBatchOperations(HRegion.java:3157)
at 
org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3216)
at 
org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3698)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4112)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4045)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3976)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3967)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3981)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4308)
at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3100)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion$PutThread.run(TestHRegion.java:3704)
2019-09-09 15:40:58,842 DEBUG [Time-limited test] 
compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
files, 0 compacting, 0 eligible, 16 blocking
2019-09-09 15:40:58,845 DEBUG [Time-limited test] 
compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
files, 0 compacting, 0 eligible, 16 blocking
2019-09-09 15:40:58,845 DEBUG [Time-limited test] 
compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
files, 0 compacting, 0 eligible, 16 blocking
2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
files, 0 compacting, 0 eligible, 16 blocking
2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
files, 0 compacting, 0 eligible, 16 blocking
2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
files, 0 compacting, 0 eligible, 16 blocking
2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
files, 0 compacting, 0 eligible, 16 blocking
2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
files, 0 compacting, 0 eligible, 16 blocking
2019-09-09 15:40:58,846 DEBUG [Time-limited test] 
compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
files, 0 compacting, 0 eligible, 16 blocking
2019-09-09 15:40:58,847 DEBUG [Time-limited test] 
compactions.SortedCompactionPolicy(66): Selecting compaction from 0 store 
files, 0 compacting, 0 eligible, 16 blocking
2019-09-09 15:40:59,837 WARN  [FlushThread] 
regionserver.MultiVersionConcurrencyControl(228): STUCK:

[jira] [Resolved] (HBASE-22994) The TestHRegionWithInMemoryFlush is broken

2019-09-09 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu resolved HBASE-22994.
--
Resolution: Duplicate

Sorry for the noisy, created a duplicated issue. closing it now.

> The TestHRegionWithInMemoryFlush is broken
> --
>
> Key: HBASE-22994
> URL: https://issues.apache.org/jira/browse/HBASE-22994
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Priority: Major
>
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:334)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:193)
>   at 
> org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:115)
>   at 
> org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:176)
>   at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:334)
>   at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.doAdd(AbstractMemStore.java:157)
>   at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.doAddOrUpsert(AbstractMemStore.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:117)
>   at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:111)
>   at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:771)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.applyToMemStore(HRegion.java:4474)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.access$500(HRegion.java:228)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.applyFamilyMapToMemStore(HRegion.java:3533)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lambda$writeMiniBatchOperationsToMemStore$0(HRegion.java:3224)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.visitBatchOperations(HRegion.java:3157)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3216)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3698)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4112)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4045)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3976)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3967)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3981)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4308)
>   at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3100)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegionWithInMemoryFlush.testFlushAndMemstoreSizeCounting(TestHRegionWithInMemoryFlush.java:85)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at

[jira] [Created] (HBASE-22995) The TestHRegionWithInMemoryFlush is broken

2019-09-09 Thread Zheng Hu (Jira)

Zheng Hu created HBASE-22995:


 Summary: The TestHRegionWithInMemoryFlush is broken
 Key: HBASE-22995
 URL: https://issues.apache.org/jira/browse/HBASE-22995
 Project: HBase
  Issue Type: Bug
Reporter: Zheng Hu


{code}
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:334)
at 
org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:193)
at 
org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:115)
at 
org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:176)
at 
org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:334)
at 
org.apache.hadoop.hbase.regionserver.AbstractMemStore.doAdd(AbstractMemStore.java:157)
at 
org.apache.hadoop.hbase.regionserver.AbstractMemStore.doAddOrUpsert(AbstractMemStore.java:147)
at 
org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:117)
at 
org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:111)
at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:771)
at 
org.apache.hadoop.hbase.regionserver.HRegion.applyToMemStore(HRegion.java:4474)
at 
org.apache.hadoop.hbase.regionserver.HRegion.access$500(HRegion.java:228)
at 
org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.applyFamilyMapToMemStore(HRegion.java:3533)
at 
org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lambda$writeMiniBatchOperationsToMemStore$0(HRegion.java:3224)
at 
org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.visitBatchOperations(HRegion.java:3157)
at 
org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3216)
at 
org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3698)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4112)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4045)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3976)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3967)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3981)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4308)
at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3100)
at 
org.apache.hadoop.hbase.regionserver.TestHRegionWithInMemoryFlush.testFlushAndMemstoreSizeCounting(TestHRegionWithInMemoryFlush.java:85)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at

[jira] [Created] (HBASE-22994) The TestHRegionWithInMemoryFlush is broken

2019-09-09 Thread Zheng Hu (Jira)

Zheng Hu created HBASE-22994:


 Summary: The TestHRegionWithInMemoryFlush is broken
 Key: HBASE-22994
 URL: https://issues.apache.org/jira/browse/HBASE-22994
 Project: HBase
  Issue Type: Bug
Reporter: Zheng Hu


{code}
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:334)
at 
org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:193)
at 
org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:115)
at 
org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:176)
at 
org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:334)
at 
org.apache.hadoop.hbase.regionserver.AbstractMemStore.doAdd(AbstractMemStore.java:157)
at 
org.apache.hadoop.hbase.regionserver.AbstractMemStore.doAddOrUpsert(AbstractMemStore.java:147)
at 
org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:117)
at 
org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:111)
at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:771)
at 
org.apache.hadoop.hbase.regionserver.HRegion.applyToMemStore(HRegion.java:4474)
at 
org.apache.hadoop.hbase.regionserver.HRegion.access$500(HRegion.java:228)
at 
org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.applyFamilyMapToMemStore(HRegion.java:3533)
at 
org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lambda$writeMiniBatchOperationsToMemStore$0(HRegion.java:3224)
at 
org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.visitBatchOperations(HRegion.java:3157)
at 
org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3216)
at 
org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3698)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4112)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:4045)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3976)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3967)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3981)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4308)
at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3100)
at 
org.apache.hadoop.hbase.regionserver.TestHRegionWithInMemoryFlush.testFlushAndMemstoreSizeCounting(TestHRegionWithInMemoryFlush.java:85)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at

[jira] [Resolved] (HBASE-22912) [Backport] HBASE-22867 to branch-1 to avoid ForkJoinPool to spawn thousands of threads

2019-09-05 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu resolved HBASE-22912.
--
Fix Version/s: 1.4.11
   1.5.0
 Hadoop Flags: Reviewed
 Release Note: 
Replace the ForkJoinPool in CleanerChore by ThreadPoolExecutor which can limit 
the spawn thread size and avoid the master GC frequently. The replacement is an 
internal implementation in CleanerChore, so no config key change, the upstream 
users can just upgrade the hbase master without any other change.

   Resolution: Fixed

Pushed to branch-1 & branch-1.4, Thanks [~reidchan] & [~apurtell] for 
reviewing. 

> [Backport] HBASE-22867 to branch-1 to avoid ForkJoinPool to spawn thousands 
> of threads
> --
>
> Key: HBASE-22912
> URL: https://issues.apache.org/jira/browse/HBASE-22912
> Project: HBase
>  Issue Type: Improvement
>Reporter: Reid Chan
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 1.5.0, 1.4.11
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Resolved] (HBASE-22937) The RawBytesComparator in branch-1 have wrong comparison order

2019-09-05 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu resolved HBASE-22937.
--
Fix Version/s: 1.4.11
   1.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> The RawBytesComparator in branch-1 have wrong comparison order
> --
>
> Key: HBASE-22937
> URL: https://issues.apache.org/jira/browse/HBASE-22937
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 1.5.0, 1.4.11
>
>
> When digging the HBASE-22862, we found a bug in 
> RawBytesComparator#compareOnlyKeyPortion  (although it's unrelated to the 
> corruption in HBASE-22862). 
> {code}
> @Override
> @VisibleForTesting
> public int compareOnlyKeyPortion(Cell left, Cell right) {
> // ...
>   return (0xff & left.getTypeByte()) - (0xff & right.getTypeByte());
> }
> {code}
> Here should be (0xff & right.getTypeByte()) - (0xff & left.getTypeByte())  I 
> think.
> I can see the BloomFilter or HFile v2 are still using the comparator in 
> branch-1 (but not in branch-2). Maybe we can just remove the class (if some 
> HFile encoded with this comparator, then mapping to the correct KVComparator 
> just like 2.x), or fix the bug in current RawBytesComparator.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (HBASE-22937) The RawBytesComparator in branch-1 have wrong comparison order

2019-09-05 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923295#comment-16923295
 ] 

Zheng Hu commented on HBASE-22937:
--

Pushed to branch-1 & branch-1.4 , Thanks [~anoop.hbase] for reviewing.

> The RawBytesComparator in branch-1 have wrong comparison order
> --
>
> Key: HBASE-22937
> URL: https://issues.apache.org/jira/browse/HBASE-22937
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
>
> When digging the HBASE-22862, we found a bug in 
> RawBytesComparator#compareOnlyKeyPortion  (although it's unrelated to the 
> corruption in HBASE-22862). 
> {code}
> @Override
> @VisibleForTesting
> public int compareOnlyKeyPortion(Cell left, Cell right) {
> // ...
>   return (0xff & left.getTypeByte()) - (0xff & right.getTypeByte());
> }
> {code}
> Here should be (0xff & right.getTypeByte()) - (0xff & left.getTypeByte())  I 
> think.
> I can see the BloomFilter or HFile v2 are still using the comparator in 
> branch-1 (but not in branch-2). Maybe we can just remove the class (if some 
> HFile encoded with this comparator, then mapping to the correct KVComparator 
> just like 2.x), or fix the bug in current RawBytesComparator.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose

2019-09-04 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-21879:
-
Release Note: 
Before this issue, we've made the read path 100% offheap when block hit the 
BucketCache 100%, but if the cache missed then RS need to read the block by 
on-heap API, which would cause high young GC pressure.
This issue will read the block by offheap even if reading the block from 
filesystem directly, it have some requirement for hadoop version(>=2.9.3) but 
can also works with older hadoop version(means still works fine but will read 
block onheap). We have written a careful doc about the implementation, 
performance and practice here: 
https://docs.google.com/document/d/1xSy9axGxafoH-Qc17zbD2Bd--rWjjI00xTWQZ8ZwI_E/edit#heading=h.nch5d72p27ex,
 for more details please read it.

> Read HFile's block to ByteBuffer directly instead of to byte for reducing 
> young gc purpose
> --
>
> Key: HBASE-21879
> URL: https://issues.apache.org/jira/browse/HBASE-21879
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HBASE-21879.v1.patch, HBASE-21879.v1.patch, 
> QPS-latencies-before-HBASE-21879.png, gc-data-before-HBASE-21879.png
>
>
> In HFileBlock#readBlockDataInternal,  we have the following: 
> {code}
> @VisibleForTesting
> protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
> long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, 
> boolean updateMetrics)
>  throws IOException {
>  // .
>   // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with 
> BBPool (offheap).
>   byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
>   int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
>   onDiskSizeWithHeader - preReadHeaderSize, true, offset + 
> preReadHeaderSize, pread);
>   if (headerBuf != null) {
> // ...
>   }
>   // ...
>  }
> {code}
> In the read path,  we still read the block from hfile to on-heap byte[], then 
> copy the on-heap byte[] to offheap bucket cache asynchronously,  and in my  
> 100% get performance test, I also observed some frequent young gc,  The 
> largest memory footprint in the young gen should be the on-heap block byte[].
> In fact, we can read HFile's block to ByteBuffer directly instead of to 
> byte[] for reducing young gc purpose. we did not implement this before, 
> because no ByteBuffer reading interface in the older HDFS client, but 2.7+ 
> has supported this now,  so we can fix this now. I think. 
> Will provide an patch and some perf-comparison for this. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Resolved] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose

2019-09-04 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu resolved HBASE-21879.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

> Read HFile's block to ByteBuffer directly instead of to byte for reducing 
> young gc purpose
> --
>
> Key: HBASE-21879
> URL: https://issues.apache.org/jira/browse/HBASE-21879
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HBASE-21879.v1.patch, HBASE-21879.v1.patch, 
> QPS-latencies-before-HBASE-21879.png, gc-data-before-HBASE-21879.png
>
>
> In HFileBlock#readBlockDataInternal,  we have the following: 
> {code}
> @VisibleForTesting
> protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
> long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, 
> boolean updateMetrics)
>  throws IOException {
>  // .
>   // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with 
> BBPool (offheap).
>   byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
>   int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
>   onDiskSizeWithHeader - preReadHeaderSize, true, offset + 
> preReadHeaderSize, pread);
>   if (headerBuf != null) {
> // ...
>   }
>   // ...
>  }
> {code}
> In the read path,  we still read the block from hfile to on-heap byte[], then 
> copy the on-heap byte[] to offheap bucket cache asynchronously,  and in my  
> 100% get performance test, I also observed some frequent young gc,  The 
> largest memory footprint in the young gen should be the on-heap block byte[].
> In fact, we can read HFile's block to ByteBuffer directly instead of to 
> byte[] for reducing young gc purpose. we did not implement this before, 
> because no ByteBuffer reading interface in the older HDFS client, but 2.7+ 
> has supported this now,  so we can fix this now. I think. 
> Will provide an patch and some perf-comparison for this. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous

2019-09-02 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921107#comment-16921107
 ] 

Zheng Hu commented on HBASE-22862:
--

The problem is  I still cannot find the way to reproduce the bug in HBase way, 
you said you encountered the bug  frequently when execute some phonenix SQL,   
is is possible to reproduce it in HBase way ? by UT or some hbase shell ?   Or 
the accurate phoenix sql to reproduce it ( then I think we can file a phoenix 
JIRA to track this, i can help if need any HBase support).
Thanks.

> Region Server crash with: Added a key not lexically larger than previous
> 
>
> Key: HBASE-22862
> URL: https://issues.apache.org/jira/browse/HBASE-22862
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.10
> Environment: {code}
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02)
> OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed 
> mode)
> {code}
>Reporter: Alex Batyrshin
>Assignee: Zheng Hu
>Priority: Critical
> Attachments: HBASE-22862.UT.v01.patch, HBASE-22862.UT.v02.patch
>
>
> We observe error "Added a key not lexically larger than previous” that cause 
> most of our region-servers to crash in our cluster.
> {code}
> 2019-08-15 18:02:10,554 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 1/1 column families, memstore=56.08 MB
> 2019-08-15 18:02:10,727 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=0
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
>at java.lang.Thread.run(Thread.java:748)
> 2019-08-15 18:02:21,776 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=9
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
>

[jira] [Commented] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose

2019-08-28 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917460#comment-16917460
 ] 

Zheng Hu commented on HBASE-21879:
--

OK, sounds reasonable.  let's mark it as resolved, and will attach the release 
note.

> Read HFile's block to ByteBuffer directly instead of to byte for reducing 
> young gc purpose
> --
>
> Key: HBASE-21879
> URL: https://issues.apache.org/jira/browse/HBASE-21879
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HBASE-21879.v1.patch, HBASE-21879.v1.patch, 
> QPS-latencies-before-HBASE-21879.png, gc-data-before-HBASE-21879.png
>
>
> In HFileBlock#readBlockDataInternal,  we have the following: 
> {code}
> @VisibleForTesting
> protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
> long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, 
> boolean updateMetrics)
>  throws IOException {
>  // .
>   // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with 
> BBPool (offheap).
>   byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
>   int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
>   onDiskSizeWithHeader - preReadHeaderSize, true, offset + 
> preReadHeaderSize, pread);
>   if (headerBuf != null) {
> // ...
>   }
>   // ...
>  }
> {code}
> In the read path,  we still read the block from hfile to on-heap byte[], then 
> copy the on-heap byte[] to offheap bucket cache asynchronously,  and in my  
> 100% get performance test, I also observed some frequent young gc,  The 
> largest memory footprint in the young gen should be the on-heap block byte[].
> In fact, we can read HFile's block to ByteBuffer directly instead of to 
> byte[] for reducing young gc purpose. we did not implement this before, 
> because no ByteBuffer reading interface in the older HDFS client, but 2.7+ 
> has supported this now,  so we can fix this now. I think. 
> Will provide an patch and some perf-comparison for this. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous

2019-08-28 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917459#comment-16917459
 ] 

Zheng Hu commented on HBASE-22862:
--

[~0x62ash], Could you help to confirm that the Phoenix in your path use the 
HBase coprocessor or not ?  (I'm not familiar with phoenix.).  Or the phoenix 
throw any stacktrace to locate the code path ?  What's your table & SQL ?  
Maybe need some phoenix guys to help to locate the bug...

> Region Server crash with: Added a key not lexically larger than previous
> 
>
> Key: HBASE-22862
> URL: https://issues.apache.org/jira/browse/HBASE-22862
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.10
> Environment: {code}
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02)
> OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed 
> mode)
> {code}
>Reporter: Alex Batyrshin
>Assignee: Zheng Hu
>Priority: Critical
> Attachments: HBASE-22862.UT.v01.patch, HBASE-22862.UT.v02.patch
>
>
> We observe error "Added a key not lexically larger than previous” that cause 
> most of our region-servers to crash in our cluster.
> {code}
> 2019-08-15 18:02:10,554 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 1/1 column families, memstore=56.08 MB
> 2019-08-15 18:02:10,727 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=0
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
>at java.lang.Thread.run(Thread.java:748)
> 2019-08-15 18:02:21,776 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=9
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>

[jira] [Commented] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous

2019-08-27 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917390#comment-16917390
 ] 

Zheng Hu commented on HBASE-22862:
--

bq. This is correct right - we need the type to be sorted in reverse order - 
Deletes to appear before puts. 
You can see the KeyValue#Type: 
{code}
  public static enum Type {
Minimum((byte)0),
Put((byte)4),

Delete((byte)8),
DeleteFamilyVersion((byte)10),
DeleteColumn((byte)12),
DeleteFamily((byte)14),

// Maximum is used when searching; you look from maximum on down.
Maximum((byte)255);

private final byte code;
}
{code}
and the CellComparator#compareWithoutRow impl: 
{code}
// Compare types. Let the delete types sort ahead of puts; i.e. types
// of higher numbers sort before those of lesser numbers. Maximum (255)
// appears ahead of everything, and minimum (0) appears after
// everything.
return (0xff & rightCell.getTypeByte()) - (0xff & leftCell.getTypeByte());
{code}
So if deletes to appear before puts, then  should be (0xff & 
right.getTypeByte()) - (0xff & left.getTypeByte()) I think.
[~apurtell], OK, let me fix this in HBASE-22937.

> Region Server crash with: Added a key not lexically larger than previous
> 
>
> Key: HBASE-22862
> URL: https://issues.apache.org/jira/browse/HBASE-22862
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.10
> Environment: {code}
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02)
> OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed 
> mode)
> {code}
>Reporter: Alex Batyrshin
>Assignee: Zheng Hu
>Priority: Critical
> Attachments: HBASE-22862.UT.v01.patch, HBASE-22862.UT.v02.patch
>
>
> We observe error "Added a key not lexically larger than previous” that cause 
> most of our region-servers to crash in our cluster.
> {code}
> 2019-08-15 18:02:10,554 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 1/1 column families, memstore=56.08 MB
> 2019-08-15 18:02:10,727 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=0
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
>at java.lang.Thread.run(Thread.java:748)
> 2019-08-15 18:02:21,776 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=9
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
>

[jira] [Created] (HBASE-22937) The RawBytesComparator in branch-1 have wrong comparison order

2019-08-27 Thread Zheng Hu (Jira)

Zheng Hu created HBASE-22937:


 Summary: The RawBytesComparator in branch-1 have wrong comparison 
order
 Key: HBASE-22937
 URL: https://issues.apache.org/jira/browse/HBASE-22937
 Project: HBase
  Issue Type: Bug
Reporter: Zheng Hu
Assignee: Zheng Hu


When digging the HBASE-22862, we found a bug in 
RawBytesComparator#compareOnlyKeyPortion  (although it's unrelated to the 
corruption in HBASE-22862). 
{code}
@Override
@VisibleForTesting
public int compareOnlyKeyPortion(Cell left, Cell right) {
// ...
  return (0xff & left.getTypeByte()) - (0xff & right.getTypeByte());
}
{code}

Here should be (0xff & right.getTypeByte()) - (0xff & left.getTypeByte())  I 
think.

I can see the BloomFilter or HFile v2 are still using the comparator in 
branch-1 (but not in branch-2). Maybe we can just remove the class (if some 
HFile encoded with this comparator, then mapping to the correct KVComparator 
just like 2.x), or fix the bug in current RawBytesComparator.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous

2019-08-27 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917378#comment-16917378
 ] 

Zheng Hu commented on HBASE-22862:
--

bq. Does the VisibleForTesting annotation mean the method with the weird and 
possibly incorrect comparison is only used in unit tests?
No,  I can see the BloomFilter  or HFile v2 are still using the comparator. 
Maybe we can just remove the class (if some HFile encoded with this comparator, 
then mapping to the correct KVComparator just like 2.x),  or fix the bug in 
current RawBytesComparator.  Anyway,  let me file a JIRA to address this  thing.

> Region Server crash with: Added a key not lexically larger than previous
> 
>
> Key: HBASE-22862
> URL: https://issues.apache.org/jira/browse/HBASE-22862
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.10
> Environment: {code}
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02)
> OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed 
> mode)
> {code}
>Reporter: Alex Batyrshin
>Assignee: Zheng Hu
>Priority: Critical
> Attachments: HBASE-22862.UT.v01.patch, HBASE-22862.UT.v02.patch
>
>
> We observe error "Added a key not lexically larger than previous” that cause 
> most of our region-servers to crash in our cluster.
> {code}
> 2019-08-15 18:02:10,554 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 1/1 column families, memstore=56.08 MB
> 2019-08-15 18:02:10,727 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=0
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
>at java.lang.Thread.run(Thread.java:748)
> 2019-08-15 18:02:21,776 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=9
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
>

[jira] [Updated] (HBASE-22912) [Backport] HBASE-22867 to branch-1 to avoid ForkJoinPool to spawn thousands of threads

2019-08-27 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22912:
-
Summary: [Backport] HBASE-22867 to branch-1 to avoid ForkJoinPool to spawn 
thousands of threads  (was: [Backport] HBASE-22867 to branch-1)

> [Backport] HBASE-22867 to branch-1 to avoid ForkJoinPool to spawn thousands 
> of threads
> --
>
> Key: HBASE-22912
> URL: https://issues.apache.org/jira/browse/HBASE-22912
> Project: HBase
>  Issue Type: Improvement
>Reporter: Reid Chan
>Assignee: Zheng Hu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous

2019-08-27 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916459#comment-16916459
 ] 

Zheng Hu commented on HBASE-22862:
--

[~0x62ash],  How did you write the row & delete the column into HBase ?  would 
you mind to share your hbase client desgin ?  I think it would be helpful to 
fix this bug.

> Region Server crash with: Added a key not lexically larger than previous
> 
>
> Key: HBASE-22862
> URL: https://issues.apache.org/jira/browse/HBASE-22862
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.10
> Environment: {code}
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02)
> OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed 
> mode)
> {code}
>Reporter: Alex Batyrshin
>Assignee: Zheng Hu
>Priority: Critical
> Attachments: HBASE-22862.UT.v01.patch, HBASE-22862.UT.v02.patch
>
>
> We observe error "Added a key not lexically larger than previous” that cause 
> most of our region-servers to crash in our cluster.
> {code}
> 2019-08-15 18:02:10,554 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 1/1 column families, memstore=56.08 MB
> 2019-08-15 18:02:10,727 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=0
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
>at java.lang.Thread.run(Thread.java:748)
> 2019-08-15 18:02:21,776 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=9
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
>

[jira] [Commented] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous

2019-08-27 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916458#comment-16916458
 ] 

Zheng Hu commented on HBASE-22862:
--

Created a UT by testing it in the whole RPC path (patch.v02),  it seems also 
OK.  Still don't find the way to reproduce the bug.

> Region Server crash with: Added a key not lexically larger than previous
> 
>
> Key: HBASE-22862
> URL: https://issues.apache.org/jira/browse/HBASE-22862
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.10
> Environment: {code}
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02)
> OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed 
> mode)
> {code}
>Reporter: Alex Batyrshin
>Assignee: Zheng Hu
>Priority: Critical
> Attachments: HBASE-22862.UT.v01.patch, HBASE-22862.UT.v02.patch
>
>
> We observe error "Added a key not lexically larger than previous” that cause 
> most of our region-servers to crash in our cluster.
> {code}
> 2019-08-15 18:02:10,554 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 1/1 column families, memstore=56.08 MB
> 2019-08-15 18:02:10,727 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=0
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
>at java.lang.Thread.run(Thread.java:748)
> 2019-08-15 18:02:21,776 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=9
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
>

[jira] [Updated] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous

2019-08-27 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22862:
-
Attachment: HBASE-22862.UT.v02.patch

> Region Server crash with: Added a key not lexically larger than previous
> 
>
> Key: HBASE-22862
> URL: https://issues.apache.org/jira/browse/HBASE-22862
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.10
> Environment: {code}
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02)
> OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed 
> mode)
> {code}
>Reporter: Alex Batyrshin
>Assignee: Zheng Hu
>Priority: Critical
> Attachments: HBASE-22862.UT.v01.patch, HBASE-22862.UT.v02.patch
>
>
> We observe error "Added a key not lexically larger than previous” that cause 
> most of our region-servers to crash in our cluster.
> {code}
> 2019-08-15 18:02:10,554 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 1/1 column families, memstore=56.08 MB
> 2019-08-15 18:02:10,727 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=0
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
>at java.lang.Thread.run(Thread.java:748)
> 2019-08-15 18:02:21,776 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=9
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>

[jira] [Commented] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous

2019-08-26 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916358#comment-16916358
 ] 

Zheng Hu commented on HBASE-22862:
--

Just create a UT in memstore level, seems works fine. should have no problem in 
Memstore level.

> Region Server crash with: Added a key not lexically larger than previous
> 
>
> Key: HBASE-22862
> URL: https://issues.apache.org/jira/browse/HBASE-22862
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.10
> Environment: {code}
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02)
> OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed 
> mode)
> {code}
>Reporter: Alex Batyrshin
>Assignee: Zheng Hu
>Priority: Critical
> Attachments: HBASE-22862.UT.v01.patch
>
>
> We observe error "Added a key not lexically larger than previous” that cause 
> most of our region-servers to crash in our cluster.
> {code}
> 2019-08-15 18:02:10,554 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 1/1 column families, memstore=56.08 MB
> 2019-08-15 18:02:10,727 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=0
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
>at java.lang.Thread.run(Thread.java:748)
> 2019-08-15 18:02:21,776 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=9
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
>

[jira] [Updated] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous

2019-08-26 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22862:
-
Attachment: HBASE-22862.UT.v01.patch

> Region Server crash with: Added a key not lexically larger than previous
> 
>
> Key: HBASE-22862
> URL: https://issues.apache.org/jira/browse/HBASE-22862
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.10
> Environment: {code}
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02)
> OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed 
> mode)
> {code}
>Reporter: Alex Batyrshin
>Assignee: Zheng Hu
>Priority: Critical
> Attachments: HBASE-22862.UT.v01.patch
>
>
> We observe error "Added a key not lexically larger than previous” that cause 
> most of our region-servers to crash in our cluster.
> {code}
> 2019-08-15 18:02:10,554 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 1/1 column families, memstore=56.08 MB
> 2019-08-15 18:02:10,727 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=0
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
>at java.lang.Thread.run(Thread.java:748)
> 2019-08-15 18:02:21,776 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=9
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
>

[jira] [Commented] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous

2019-08-26 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916345#comment-16916345
 ] 

Zheng Hu commented on HBASE-22862:
--

Found the stacktrace message: 
{code}
Current cell = 
\x110090013098!>ct94a/d:apd/1565936313172/DeleteColumn/vlen=0/seqid=38334251,
 lastCell = 
\x110090013098!>ct94a/d:apd/1565936313172/Put/vlen=12/seqid=38338239
{code}
I think you put a deleteColumn first (because the deleteColumn has a smaller 
seqid=38334251),  then you request a put with the same rowkey & column & ts 
(seqid=38338239).  Seems the snapshot memstore made the put ahead of the 
deleteColumn , which volidate the sorting order. 
Will try to make a UT to reproduce this bug. 
Thanks [~0x62ash].

> Region Server crash with: Added a key not lexically larger than previous
> 
>
> Key: HBASE-22862
> URL: https://issues.apache.org/jira/browse/HBASE-22862
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.10
> Environment: {code}
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02)
> OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed 
> mode)
> {code}
>Reporter: Alex Batyrshin
>Assignee: Zheng Hu
>Priority: Critical
>
> We observe error "Added a key not lexically larger than previous” that cause 
> most of our region-servers to crash in our cluster.
> {code}
> 2019-08-15 18:02:10,554 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 1/1 column families, memstore=56.08 MB
> 2019-08-15 18:02:10,727 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=0
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
>at java.lang.Thread.run(Thread.java:748)
> 2019-08-15 18:02:21,776 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=9
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at

[jira] [Commented] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous

2019-08-26 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916321#comment-16916321
 ] 

Zheng Hu commented on HBASE-22862:
--

OK, found one bug in RawBytesComparator#compareOnlyKeyPortion (but seems 
unrelated to this crash), [~stack]  & [~Apache9] & [~apurtell] FYI:
{code}
@Override
@VisibleForTesting
public int compareOnlyKeyPortion(Cell left, Cell right) {
// ...
  return (0xff & left.getTypeByte()) - (0xff & right.getTypeByte());
}
{code}
Here should be  (0xff & right.getTypeByte()) - (0xff & left.getTypeByte())  ? I 
think.


> Region Server crash with: Added a key not lexically larger than previous
> 
>
> Key: HBASE-22862
> URL: https://issues.apache.org/jira/browse/HBASE-22862
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.10
> Environment: {code}
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02)
> OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed 
> mode)
> {code}
>Reporter: Alex Batyrshin
>Assignee: Zheng Hu
>Priority: Critical
>
> We observe error "Added a key not lexically larger than previous” that cause 
> most of our region-servers to crash in our cluster.
> {code}
> 2019-08-15 18:02:10,554 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 1/1 column families, memstore=56.08 MB
> 2019-08-15 18:02:10,727 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=0
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
>at java.lang.Thread.run(Thread.java:748)
> 2019-08-15 18:02:21,776 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=9
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
>

[jira] [Commented] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous

2019-08-26 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916287#comment-16916287
 ] 

Zheng Hu commented on HBASE-22862:
--

Let me take a close look. Thanks.

> Region Server crash with: Added a key not lexically larger than previous
> 
>
> Key: HBASE-22862
> URL: https://issues.apache.org/jira/browse/HBASE-22862
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.10
> Environment: {code}
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02)
> OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed 
> mode)
> {code}
>Reporter: Alex Batyrshin
>Priority: Critical
>
> We observe error "Added a key not lexically larger than previous” that cause 
> most of our region-servers to crash in our cluster.
> {code}
> 2019-08-15 18:02:10,554 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 1/1 column families, memstore=56.08 MB
> 2019-08-15 18:02:10,727 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=0
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
>at java.lang.Thread.run(Thread.java:748)
> 2019-08-15 18:02:21,776 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=9
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)

[jira] [Assigned] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous

2019-08-26 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu reassigned HBASE-22862:


Assignee: Zheng Hu

> Region Server crash with: Added a key not lexically larger than previous
> 
>
> Key: HBASE-22862
> URL: https://issues.apache.org/jira/browse/HBASE-22862
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.10
> Environment: {code}
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02)
> OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed 
> mode)
> {code}
>Reporter: Alex Batyrshin
>Assignee: Zheng Hu
>Priority: Critical
>
> We observe error "Added a key not lexically larger than previous” that cause 
> most of our region-servers to crash in our cluster.
> {code}
> 2019-08-15 18:02:10,554 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 1/1 column families, memstore=56.08 MB
> 2019-08-15 18:02:10,727 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=0
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
>at java.lang.Thread.run(Thread.java:748)
> 2019-08-15 18:02:21,776 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=9
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at

[jira] [Assigned] (HBASE-22912) [Backport] HBASE-22867 to branch-1

2019-08-25 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu reassigned HBASE-22912:


Assignee: Zheng Hu

> [Backport] HBASE-22867 to branch-1
> --
>
> Key: HBASE-22912
> URL: https://issues.apache.org/jira/browse/HBASE-22912
> Project: HBase
>  Issue Type: Improvement
>Reporter: Reid Chan
>Assignee: Zheng Hu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (HBASE-22867) The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table

2019-08-25 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22867:
-
Component/s: master

> The ForkJoinPool in CleanerChore will spawn thousands of threads in our 
> cluster with thousands table
> 
>
> Key: HBASE-22867
> URL: https://issues.apache.org/jira/browse/HBASE-22867
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Critical
> Fix For: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>
> Attachments: 191318.stack, 191318.stack.1, 31162.stack.1
>
>
> The thousands of spawned  threads make the safepoint cost 80+s in our Master 
> JVM processs.
> {code}
> 2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] 
> org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard 
> from server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket 
> connection and at
> tempting reconnect
> {code}
> The stdout from JVM (can see from here there're 9126 threads & sync cost 80+s)
> {code}
> vmop[threads: total initially_running wait_to_block]
> [time: spin block sync cleanup vmop] page_trap_count
> 32358.859: ForceAsyncSafepoint  [9126 67
> 474]  [ 128 8659687   101]  0
> {code}
> Also we got the jstack: 
> {code}
> $ cat 31162.stack.1  | grep 'ForkJoinPool-1-worker' | wc -l
> 8648
> {code}
> It's a dangerous bug, make it as blocker.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Resolved] (HBASE-22867) The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table

2019-08-25 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu resolved HBASE-22867.
--
Hadoop Flags: Reviewed
Release Note: Replace the ForkJoinPool in CleanerChore by 
ThreadPoolExecutor which can limit the spawn thread size and avoid  the master 
GC frequently.  The replacement is an internal implementation in CleanerChore,  
so no config key change, the upstream users can just upgrade the hbase master 
without any other change.
Tags: master
  Resolution: Fixed

> The ForkJoinPool in CleanerChore will spawn thousands of threads in our 
> cluster with thousands table
> 
>
> Key: HBASE-22867
> URL: https://issues.apache.org/jira/browse/HBASE-22867
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Critical
> Fix For: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>
> Attachments: 191318.stack, 191318.stack.1, 31162.stack.1
>
>
> The thousands of spawned  threads make the safepoint cost 80+s in our Master 
> JVM processs.
> {code}
> 2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] 
> org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard 
> from server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket 
> connection and at
> tempting reconnect
> {code}
> The stdout from JVM (can see from here there're 9126 threads & sync cost 80+s)
> {code}
> vmop[threads: total initially_running wait_to_block]
> [time: spin block sync cleanup vmop] page_trap_count
> 32358.859: ForceAsyncSafepoint  [9126 67
> 474]  [ 128 8659687   101]  0
> {code}
> Also we got the jstack: 
> {code}
> $ cat 31162.stack.1  | grep 'ForkJoinPool-1-worker' | wc -l
> 8648
> {code}
> It's a dangerous bug, make it as blocker.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (HBASE-22867) The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table

2019-08-25 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915415#comment-16915415
 ] 

Zheng Hu commented on HBASE-22867:
--

OK,  so let's address the backport in HBASE-22912, I will accomplish that.   
Plan to close this JIRA now. Thanks [~Apache9] & [~reidchan] for reviewing.

> The ForkJoinPool in CleanerChore will spawn thousands of threads in our 
> cluster with thousands table
> 
>
> Key: HBASE-22867
> URL: https://issues.apache.org/jira/browse/HBASE-22867
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Critical
> Fix For: 3.0.0, 2.3.0, 2.2.1, 2.1.6
>
> Attachments: 191318.stack, 191318.stack.1, 31162.stack.1
>
>
> The thousands of spawned  threads make the safepoint cost 80+s in our Master 
> JVM processs.
> {code}
> 2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] 
> org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard 
> from server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket 
> connection and at
> tempting reconnect
> {code}
> The stdout from JVM (can see from here there're 9126 threads & sync cost 80+s)
> {code}
> vmop[threads: total initially_running wait_to_block]
> [time: spin block sync cleanup vmop] page_trap_count
> 32358.859: ForceAsyncSafepoint  [9126 67
> 474]  [ 128 8659687   101]  0
> {code}
> Also we got the jstack: 
> {code}
> $ cat 31162.stack.1  | grep 'ForkJoinPool-1-worker' | wc -l
> 8648
> {code}
> It's a dangerous bug, make it as blocker.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (HBASE-22880) [Backport] HBASE-22871 to branch-1

2019-08-23 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914007#comment-16914007
 ] 

Zheng Hu commented on HBASE-22880:
--

I think we need also backport the HBASE-22867 to branch-1,   can do that after 
resovling this issue.

> [Backport] HBASE-22871 to branch-1
> --
>
> Key: HBASE-22880
> URL: https://issues.apache.org/jira/browse/HBASE-22880
> Project: HBase
>  Issue Type: Improvement
>Reporter: Reid Chan
>Priority: Major
> Fix For: 1.5.0, 1.4.11
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (HBASE-22867) The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table

2019-08-21 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912995#comment-16912995
 ] 

Zheng Hu commented on HBASE-22867:
--

[~anoop.hbase], Yes,  let me file a new issue to address the blocking 
getDetetableFiles() issue.

> The ForkJoinPool in CleanerChore will spawn thousands of threads in our 
> cluster with thousands table
> 
>
> Key: HBASE-22867
> URL: https://issues.apache.org/jira/browse/HBASE-22867
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Critical
> Attachments: 191318.stack, 191318.stack.1, 31162.stack.1
>
>
> The thousands of spawned  threads make the safepoint cost 80+s in our Master 
> JVM processs.
> {code}
> 2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] 
> org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard 
> from server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket 
> connection and at
> tempting reconnect
> {code}
> The stdout from JVM (can see from here there're 9126 threads & sync cost 80+s)
> {code}
> vmop[threads: total initially_running wait_to_block]
> [time: spin block sync cleanup vmop] page_trap_count
> 32358.859: ForceAsyncSafepoint  [9126 67
> 474]  [ 128 8659687   101]  0
> {code}
> Also we got the jstack: 
> {code}
> $ cat 31162.stack.1  | grep 'ForkJoinPool-1-worker' | wc -l
> 8648
> {code}
> It's a dangerous bug, make it as blocker.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (HBASE-22810) Initialize an separate ThreadPoolExecutor for taking/restoring snapshot

2019-08-21 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22810:
-
Fix Version/s: (was: 2.0.7)

> Initialize an separate ThreadPoolExecutor for taking/restoring snapshot 
> 
>
> Key: HBASE-22810
> URL: https://issues.apache.org/jira/browse/HBASE-22810
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.3.0, 2.2.1, 2.1.6, 1.3.6, 1.4.11
>
>
> In EventType class, we have the following definition, means  taking snapshot 
> & restoring snapshot are use the MASTER_TABLE_OPERATIONS  Executor now. 
> {code}
>   /**
>* Messages originating from Client to Master.
>* C_M_SNAPSHOT_TABLE
>* Client asking Master to snapshot an offline table.
>*/
>   C_M_SNAPSHOT_TABLE(48, ExecutorType.MASTER_TABLE_OPERATIONS),
>   /**
>* Messages originating from Client to Master.
>* C_M_RESTORE_SNAPSHOT
>* Client asking Master to restore a snapshot.
>*/
>   C_M_RESTORE_SNAPSHOT  (49, ExecutorType.MASTER_TABLE_OPERATIONS),
> {code}
> But when I checked the MASTER_TABLE_OPERATIONS thread pool initialization, I 
> see : 
> {code}
>   private void startServiceThreads() throws IOException{
>// ...  some other code initializing  
>// We depend on there being only one instance of this executor running
>// at a time.  To do concurrency, would need fencing of enable/disable of
>// tables.
>// Any time changing this maxThreads to > 1, pls see the comment at
>// AccessController#postCompletedCreateTableAction
>
> this.executorService.startExecutorService(ExecutorType.MASTER_TABLE_OPERATIONS,
>  1);
>startProcedureExecutor();
> {code}
> That's to say,  for CPs  enable or disable table sequencely,  we will create 
> a ThreadPoolExecutor with threadPoolSize=1.   Then we actually cann't 
> accomplish the snapshoting  concurrence even if they are total difference 
> tables, says if there are two table snapshoting request, and the Table A cost 
>  5min for snapshoting, then the Table B need to wait 5min and once Table A 
> finish its snapshot , then Table B will start the snapshot.
> While we've setting the snapshot timeout, so it will be easy to timeout for 
> table B snapshoting .   Actually,  we can create a separate thead pool for 
> snapshot operations only.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Resolved] (HBASE-22810) Initialize an separate ThreadPoolExecutor for taking/restoring snapshot

2019-08-21 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu resolved HBASE-22810.
--
Resolution: Fixed

> Initialize an separate ThreadPoolExecutor for taking/restoring snapshot 
> 
>
> Key: HBASE-22810
> URL: https://issues.apache.org/jira/browse/HBASE-22810
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.3.0, 2.2.1, 2.1.6, 1.3.6, 1.4.11
>
>
> In EventType class, we have the following definition, means  taking snapshot 
> & restoring snapshot are use the MASTER_TABLE_OPERATIONS  Executor now. 
> {code}
>   /**
>* Messages originating from Client to Master.
>* C_M_SNAPSHOT_TABLE
>* Client asking Master to snapshot an offline table.
>*/
>   C_M_SNAPSHOT_TABLE(48, ExecutorType.MASTER_TABLE_OPERATIONS),
>   /**
>* Messages originating from Client to Master.
>* C_M_RESTORE_SNAPSHOT
>* Client asking Master to restore a snapshot.
>*/
>   C_M_RESTORE_SNAPSHOT  (49, ExecutorType.MASTER_TABLE_OPERATIONS),
> {code}
> But when I checked the MASTER_TABLE_OPERATIONS thread pool initialization, I 
> see : 
> {code}
>   private void startServiceThreads() throws IOException{
>// ...  some other code initializing  
>// We depend on there being only one instance of this executor running
>// at a time.  To do concurrency, would need fencing of enable/disable of
>// tables.
>// Any time changing this maxThreads to > 1, pls see the comment at
>// AccessController#postCompletedCreateTableAction
>
> this.executorService.startExecutorService(ExecutorType.MASTER_TABLE_OPERATIONS,
>  1);
>startProcedureExecutor();
> {code}
> That's to say,  for CPs  enable or disable table sequencely,  we will create 
> a ThreadPoolExecutor with threadPoolSize=1.   Then we actually cann't 
> accomplish the snapshoting  concurrence even if they are total difference 
> tables, says if there are two table snapshoting request, and the Table A cost 
>  5min for snapshoting, then the Table B need to wait 5min and once Table A 
> finish its snapshot , then Table B will start the snapshot.
> While we've setting the snapshot timeout, so it will be easy to timeout for 
> table B snapshoting .   Actually,  we can create a separate thead pool for 
> snapshot operations only.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (HBASE-22810) Initialize an separate ThreadPoolExecutor for taking/restoring snapshot

2019-08-21 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912993#comment-16912993
 ] 

Zheng Hu commented on HBASE-22810:
--

Committed the addendum to all branches, Thanks all.

> Initialize an separate ThreadPoolExecutor for taking/restoring snapshot 
> 
>
> Key: HBASE-22810
> URL: https://issues.apache.org/jira/browse/HBASE-22810
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.3.0, 2.2.1, 2.1.6, 1.3.6, 1.4.11, 2.0.7
>
>
> In EventType class, we have the following definition, means  taking snapshot 
> & restoring snapshot are use the MASTER_TABLE_OPERATIONS  Executor now. 
> {code}
>   /**
>* Messages originating from Client to Master.
>* C_M_SNAPSHOT_TABLE
>* Client asking Master to snapshot an offline table.
>*/
>   C_M_SNAPSHOT_TABLE(48, ExecutorType.MASTER_TABLE_OPERATIONS),
>   /**
>* Messages originating from Client to Master.
>* C_M_RESTORE_SNAPSHOT
>* Client asking Master to restore a snapshot.
>*/
>   C_M_RESTORE_SNAPSHOT  (49, ExecutorType.MASTER_TABLE_OPERATIONS),
> {code}
> But when I checked the MASTER_TABLE_OPERATIONS thread pool initialization, I 
> see : 
> {code}
>   private void startServiceThreads() throws IOException{
>// ...  some other code initializing  
>// We depend on there being only one instance of this executor running
>// at a time.  To do concurrency, would need fencing of enable/disable of
>// tables.
>// Any time changing this maxThreads to > 1, pls see the comment at
>// AccessController#postCompletedCreateTableAction
>
> this.executorService.startExecutorService(ExecutorType.MASTER_TABLE_OPERATIONS,
>  1);
>startProcedureExecutor();
> {code}
> That's to say,  for CPs  enable or disable table sequencely,  we will create 
> a ThreadPoolExecutor with threadPoolSize=1.   Then we actually cann't 
> accomplish the snapshoting  concurrence even if they are total difference 
> tables, says if there are two table snapshoting request, and the Table A cost 
>  5min for snapshoting, then the Table B need to wait 5min and once Table A 
> finish its snapshot , then Table B will start the snapshot.
> While we've setting the snapshot timeout, so it will be easy to timeout for 
> table B snapshoting .   Actually,  we can create a separate thead pool for 
> snapshot operations only.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (HBASE-22867) The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table

2019-08-21 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912869#comment-16912869
 ] 

Zheng Hu commented on HBASE-22867:
--

Attached two jstack files:  191318.stack and 191318.stack.1 . 
I got the 191318.stack file firstly,   after few seconds,  caught the 
191318.stack.1 files.   In the first file,  we can clearly see that there are 6 
threads in dir-scan-pool  which was blocked and waiting for the 
SnapshotHFileCleaner#getDeletableFiles.  
{code}
"dir-scan-pool4-thread-8" #18765 daemon prio=5 os_prio=0 tid=0x7f4a20009c60 
nid=0x6576 waiting for monitor entry [0x7f48a6191000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner.getDeletableFiles(SnapshotHFileCleaner.java:68)
- waiting to lock <0x00034411dc88> (a 
org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles(CleanerChore.java:295)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.lambda$traverseAndDelete$1(CleanerChore.java:405)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore$$Lambda$187/1141106127.act(Unknown
 Source)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.deleteAction(CleanerChore.java:460)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.traverseAndDelete(CleanerChore.java:405)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.lambda$null$2(CleanerChore.java:414)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore$$Lambda$185/2070209024.run(Unknown
 Source)
at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
- <0x00038a476bf8> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{code}
In the second file,  the threads has finished all the work and are waiting for 
the new task.  That means the cleaner won't be blocked now, it's seems good.

> The ForkJoinPool in CleanerChore will spawn thousands of threads in our 
> cluster with thousands table
> 
>
> Key: HBASE-22867
> URL: https://issues.apache.org/jira/browse/HBASE-22867
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Critical
> Attachments: 191318.stack, 191318.stack.1, 31162.stack.1
>
>
> The thousands of spawned  threads make the safepoint cost 80+s in our Master 
> JVM processs.
> {code}
> 2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] 
> org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard 
> from server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket 
> connection and at
> tempting reconnect
> {code}
> The stdout from JVM (can see from here there're 9126 threads & sync cost 80+s)
> {code}
> vmop[threads: total initially_running wait_to_block]
> [time: spin block sync cleanup vmop] page_trap_count
> 32358.859: ForceAsyncSafepoint  [9126 67
> 474]  [ 128 8659687   101]  0
> {code}
> Also we got the jstack: 
> {code}
> $ cat 31162.stack.1  | grep 'ForkJoinPool-1-worker' | wc -l
> 8648
> {code}
> It's a dangerous bug, make it as blocker.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (HBASE-22867) The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table

2019-08-21 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22867:
-
Attachment: 191318.stack.1

> The ForkJoinPool in CleanerChore will spawn thousands of threads in our 
> cluster with thousands table
> 
>
> Key: HBASE-22867
> URL: https://issues.apache.org/jira/browse/HBASE-22867
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Critical
> Attachments: 191318.stack, 191318.stack.1, 31162.stack.1
>
>
> The thousands of spawned  threads make the safepoint cost 80+s in our Master 
> JVM processs.
> {code}
> 2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] 
> org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard 
> from server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket 
> connection and at
> tempting reconnect
> {code}
> The stdout from JVM (can see from here there're 9126 threads & sync cost 80+s)
> {code}
> vmop[threads: total initially_running wait_to_block]
> [time: spin block sync cleanup vmop] page_trap_count
> 32358.859: ForceAsyncSafepoint  [9126 67
> 474]  [ 128 8659687   101]  0
> {code}
> Also we got the jstack: 
> {code}
> $ cat 31162.stack.1  | grep 'ForkJoinPool-1-worker' | wc -l
> 8648
> {code}
> It's a dangerous bug, make it as blocker.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (HBASE-22867) The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table

2019-08-21 Thread Zheng Hu (Jira)



 [ 
https://issues.apache.org/jira/browse/HBASE-22867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22867:
-
Attachment: 191318.stack

> The ForkJoinPool in CleanerChore will spawn thousands of threads in our 
> cluster with thousands table
> 
>
> Key: HBASE-22867
> URL: https://issues.apache.org/jira/browse/HBASE-22867
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Critical
> Attachments: 191318.stack, 31162.stack.1
>
>
> The thousands of spawned  threads make the safepoint cost 80+s in our Master 
> JVM processs.
> {code}
> 2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] 
> org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard 
> from server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket 
> connection and at
> tempting reconnect
> {code}
> The stdout from JVM (can see from here there're 9126 threads & sync cost 80+s)
> {code}
> vmop[threads: total initially_running wait_to_block]
> [time: spin block sync cleanup vmop] page_trap_count
> 32358.859: ForceAsyncSafepoint  [9126 67
> 474]  [ 128 8659687   101]  0
> {code}
> Also we got the jstack: 
> {code}
> $ cat 31162.stack.1  | grep 'ForkJoinPool-1-worker' | wc -l
> 8648
> {code}
> It's a dangerous bug, make it as blocker.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Comment Edited] (HBASE-22810) Initialize an separate ThreadPoolExecutor for taking/restoring snapshot

2019-08-21 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912278#comment-16912278
 ] 

Zheng Hu edited comment on HBASE-22810 at 8/21/19 1:13 PM:
---

Created a PR to address the above uniform & add a UT.
https://github.com/apache/hbase/pull/517


was (Author: openinx):
Created a PR to address the above uniform & add a UT.

> Initialize an separate ThreadPoolExecutor for taking/restoring snapshot 
> 
>
> Key: HBASE-22810
> URL: https://issues.apache.org/jira/browse/HBASE-22810
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.3.0, 2.2.1, 2.1.6, 1.3.6, 1.4.11, 2.0.7
>
>
> In EventType class, we have the following definition, means  taking snapshot 
> & restoring snapshot are use the MASTER_TABLE_OPERATIONS  Executor now. 
> {code}
>   /**
>* Messages originating from Client to Master.
>* C_M_SNAPSHOT_TABLE
>* Client asking Master to snapshot an offline table.
>*/
>   C_M_SNAPSHOT_TABLE(48, ExecutorType.MASTER_TABLE_OPERATIONS),
>   /**
>* Messages originating from Client to Master.
>* C_M_RESTORE_SNAPSHOT
>* Client asking Master to restore a snapshot.
>*/
>   C_M_RESTORE_SNAPSHOT  (49, ExecutorType.MASTER_TABLE_OPERATIONS),
> {code}
> But when I checked the MASTER_TABLE_OPERATIONS thread pool initialization, I 
> see : 
> {code}
>   private void startServiceThreads() throws IOException{
>// ...  some other code initializing  
>// We depend on there being only one instance of this executor running
>// at a time.  To do concurrency, would need fencing of enable/disable of
>// tables.
>// Any time changing this maxThreads to > 1, pls see the comment at
>// AccessController#postCompletedCreateTableAction
>
> this.executorService.startExecutorService(ExecutorType.MASTER_TABLE_OPERATIONS,
>  1);
>startProcedureExecutor();
> {code}
> That's to say,  for CPs  enable or disable table sequencely,  we will create 
> a ThreadPoolExecutor with threadPoolSize=1.   Then we actually cann't 
> accomplish the snapshoting  concurrence even if they are total difference 
> tables, says if there are two table snapshoting request, and the Table A cost 
>  5min for snapshoting, then the Table B need to wait 5min and once Table A 
> finish its snapshot , then Table B will start the snapshot.
> While we've setting the snapshot timeout, so it will be easy to timeout for 
> table B snapshoting .   Actually,  we can create a separate thead pool for 
> snapshot operations only.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (HBASE-22810) Initialize an separate ThreadPoolExecutor for taking/restoring snapshot

2019-08-21 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912278#comment-16912278
 ] 

Zheng Hu commented on HBASE-22810:
--

Created a PR to address the above uniform & add a UT.

> Initialize an separate ThreadPoolExecutor for taking/restoring snapshot 
> 
>
> Key: HBASE-22810
> URL: https://issues.apache.org/jira/browse/HBASE-22810
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.3.0, 2.2.1, 2.1.6, 1.3.6, 1.4.11, 2.0.7
>
>
> In EventType class, we have the following definition, means  taking snapshot 
> & restoring snapshot are use the MASTER_TABLE_OPERATIONS  Executor now. 
> {code}
>   /**
>* Messages originating from Client to Master.
>* C_M_SNAPSHOT_TABLE
>* Client asking Master to snapshot an offline table.
>*/
>   C_M_SNAPSHOT_TABLE(48, ExecutorType.MASTER_TABLE_OPERATIONS),
>   /**
>* Messages originating from Client to Master.
>* C_M_RESTORE_SNAPSHOT
>* Client asking Master to restore a snapshot.
>*/
>   C_M_RESTORE_SNAPSHOT  (49, ExecutorType.MASTER_TABLE_OPERATIONS),
> {code}
> But when I checked the MASTER_TABLE_OPERATIONS thread pool initialization, I 
> see : 
> {code}
>   private void startServiceThreads() throws IOException{
>// ...  some other code initializing  
>// We depend on there being only one instance of this executor running
>// at a time.  To do concurrency, would need fencing of enable/disable of
>// tables.
>// Any time changing this maxThreads to > 1, pls see the comment at
>// AccessController#postCompletedCreateTableAction
>
> this.executorService.startExecutorService(ExecutorType.MASTER_TABLE_OPERATIONS,
>  1);
>startProcedureExecutor();
> {code}
> That's to say,  for CPs  enable or disable table sequencely,  we will create 
> a ThreadPoolExecutor with threadPoolSize=1.   Then we actually cann't 
> accomplish the snapshoting  concurrence even if they are total difference 
> tables, says if there are two table snapshoting request, and the Table A cost 
>  5min for snapshoting, then the Table B need to wait 5min and once Table A 
> finish its snapshot , then Table B will start the snapshot.
> While we've setting the snapshot timeout, so it will be easy to timeout for 
> table B snapshoting .   Actually,  we can create a separate thead pool for 
> snapshot operations only.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (HBASE-22810) Initialize an separate ThreadPoolExecutor for taking/restoring snapshot

2019-08-21 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912256#comment-16912256
 ] 

Zheng Hu commented on HBASE-22810:
--

Thanks [~stack] for the fix.  Read the UT code,  it's indeed  a test which easy 
to be flaky.  For example, all snapshot request are submitted but the snapshot 
is a bit slow,  none are completed  when the assert begin: 
{code}
+assertTrue("We expect at least 1 request to be rejected because of we 
concurrently" +
+" issued many requests", takenSize < ssNum && takenSize > 0);
{code}
Then, the assert will be failure.  so +1 for me to remove it (I guess after 
increasing the 'hbase.master.executor.snapshot.threads',   it's easy to happen 
now).

[~an...@apache.org], Thanks for the reminding .  It's true, there are two 
different config keys for the snapshot threads size,  but I think they have 
different meanings:
1.  hbase.master.executor.snapshot.threads :   means how many snapshot requests 
from client we can handle at master side the same time; 
2. hbase.snapshot.master.threads:   how many snapshot procedure we can 
coordinator with region server. 
The config key#1 limit the all the snapshot request, while the key#2 only limit 
the snapshot procedure with RS ( it's a part of the snapshot request).Maybe 
we can uniform the two config keys into one ?  although we will initialize two 
different thread pools with the same thread size for different purpose.



> Initialize an separate ThreadPoolExecutor for taking/restoring snapshot 
> 
>
> Key: HBASE-22810
> URL: https://issues.apache.org/jira/browse/HBASE-22810
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.3.0, 2.2.1, 2.1.6, 1.3.6, 1.4.11, 2.0.7
>
>
> In EventType class, we have the following definition, means  taking snapshot 
> & restoring snapshot are use the MASTER_TABLE_OPERATIONS  Executor now. 
> {code}
>   /**
>* Messages originating from Client to Master.
>* C_M_SNAPSHOT_TABLE
>* Client asking Master to snapshot an offline table.
>*/
>   C_M_SNAPSHOT_TABLE(48, ExecutorType.MASTER_TABLE_OPERATIONS),
>   /**
>* Messages originating from Client to Master.
>* C_M_RESTORE_SNAPSHOT
>* Client asking Master to restore a snapshot.
>*/
>   C_M_RESTORE_SNAPSHOT  (49, ExecutorType.MASTER_TABLE_OPERATIONS),
> {code}
> But when I checked the MASTER_TABLE_OPERATIONS thread pool initialization, I 
> see : 
> {code}
>   private void startServiceThreads() throws IOException{
>// ...  some other code initializing  
>// We depend on there being only one instance of this executor running
>// at a time.  To do concurrency, would need fencing of enable/disable of
>// tables.
>// Any time changing this maxThreads to > 1, pls see the comment at
>// AccessController#postCompletedCreateTableAction
>
> this.executorService.startExecutorService(ExecutorType.MASTER_TABLE_OPERATIONS,
>  1);
>startProcedureExecutor();
> {code}
> That's to say,  for CPs  enable or disable table sequencely,  we will create 
> a ThreadPoolExecutor with threadPoolSize=1.   Then we actually cann't 
> accomplish the snapshoting  concurrence even if they are total difference 
> tables, says if there are two table snapshoting request, and the Table A cost 
>  5min for snapshoting, then the Table B need to wait 5min and once Table A 
> finish its snapshot , then Table B will start the snapshot.
> While we've setting the snapshot timeout, so it will be easy to timeout for 
> table B snapshoting .   Actually,  we can create a separate thead pool for 
> snapshot operations only.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (HBASE-22867) The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table

2019-08-20 Thread Zheng Hu (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-22867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911136#comment-16911136
 ] 

Zheng Hu commented on HBASE-22867:
--

I agree there're two problems here: 
1.  no limit for FJP;
2. the getDeletableFiles . 
I plan to fix them in two separate issues, because seems two different problems.
Thanks.

> The ForkJoinPool in CleanerChore will spawn thousands of threads in our 
> cluster with thousands table
> 
>
> Key: HBASE-22867
> URL: https://issues.apache.org/jira/browse/HBASE-22867
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Critical
> Attachments: 31162.stack.1
>
>
> The thousands of spawned  threads make the safepoint cost 80+s in our Master 
> JVM processs.
> {code}
> 2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] 
> org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard 
> from server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket 
> connection and at
> tempting reconnect
> {code}
> The stdout from JVM (can see from here there're 9126 threads & sync cost 80+s)
> {code}
> vmop[threads: total initially_running wait_to_block]
> [time: spin block sync cleanup vmop] page_trap_count
> 32358.859: ForceAsyncSafepoint  [9126 67
> 474]  [ 128 8659687   101]  0
> {code}
> Also we got the jstack: 
> {code}
> $ cat 31162.stack.1  | grep 'ForkJoinPool-1-worker' | wc -l
> 8648
> {code}
> It's a dangerous bug, make it as blocker.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (HBASE-22810) Initialize an separate ThreadPoolExecutor for taking/restoring snapshot

2019-08-18 Thread Zheng Hu (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-22810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909963#comment-16909963
 ] 

Zheng Hu commented on HBASE-22810:
--

Fine.  Sorry , I think I forgot this issue before, Let me dig this .

> Initialize an separate ThreadPoolExecutor for taking/restoring snapshot 
> 
>
> Key: HBASE-22810
> URL: https://issues.apache.org/jira/browse/HBASE-22810
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.3.0, 2.0.6, 2.2.1, 2.1.6, 1.3.6, 1.4.11
>
>
> In EventType class, we have the following definition, means  taking snapshot 
> & restoring snapshot are use the MASTER_TABLE_OPERATIONS  Executor now. 
> {code}
>   /**
>* Messages originating from Client to Master.
>* C_M_SNAPSHOT_TABLE
>* Client asking Master to snapshot an offline table.
>*/
>   C_M_SNAPSHOT_TABLE(48, ExecutorType.MASTER_TABLE_OPERATIONS),
>   /**
>* Messages originating from Client to Master.
>* C_M_RESTORE_SNAPSHOT
>* Client asking Master to restore a snapshot.
>*/
>   C_M_RESTORE_SNAPSHOT  (49, ExecutorType.MASTER_TABLE_OPERATIONS),
> {code}
> But when I checked the MASTER_TABLE_OPERATIONS thread pool initialization, I 
> see : 
> {code}
>   private void startServiceThreads() throws IOException{
>// ...  some other code initializing  
>// We depend on there being only one instance of this executor running
>// at a time.  To do concurrency, would need fencing of enable/disable of
>// tables.
>// Any time changing this maxThreads to > 1, pls see the comment at
>// AccessController#postCompletedCreateTableAction
>
> this.executorService.startExecutorService(ExecutorType.MASTER_TABLE_OPERATIONS,
>  1);
>startProcedureExecutor();
> {code}
> That's to say,  for CPs  enable or disable table sequencely,  we will create 
> a ThreadPoolExecutor with threadPoolSize=1.   Then we actually cann't 
> accomplish the snapshoting  concurrence even if they are total difference 
> tables, says if there are two table snapshoting request, and the Table A cost 
>  5min for snapshoting, then the Table B need to wait 5min and once Table A 
> finish its snapshot , then Table B will start the snapshot.
> While we've setting the snapshot timeout, so it will be easy to timeout for 
> table B snapshoting .   Actually,  we can create a separate thead pool for 
> snapshot operations only.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Resolved] (HBASE-22841) TimeRange's factory functions do not support ranges, only `allTime` and `at`

2019-08-16 Thread Zheng Hu (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-22841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu resolved HBASE-22841.
--
  Resolution: Fixed
Hadoop Flags: Reviewed
Release Note: 
Add serveral API in TimeRange class for avoiding using the deprecated TimeRange 
constructor: 
* TimeRange#from: Represents the time interval [minStamp, Long.MAX_VALUE)
* TimeRange#until: Represents the time interval [0, maxStamp)
* TimeRange#between: Represents the time interval [minStamp, maxStamp)

[~huonw], Granted you with the contributor permission and assigned this JIRA to 
you. Thanks for your contribution.

> TimeRange's factory functions do not support ranges, only `allTime` and `at`
> 
>
> Key: HBASE-22841
> URL: https://issues.apache.org/jira/browse/HBASE-22841
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 2.2.0, 2.1.5
>Reporter: Huon Wilson
>Assignee: Huon Wilson
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.0.6, 2.2.1, 2.1.6
>
>
> The {{org.apache.hadoop.hbase.io.TimeRange}} is used in functions like 
> {{org.apache.hadoop.hbase.client.Table.CheckAndMutateBuilder#timeRange}}.
> The current ways to create a {{TimeRange}} are:
> - factory functions: {{at}} (a single instant), {{allTime}} (all valid 
> timestamps)
> - deprecated and {{@InterfaceAudience.Private}} constructors, which support 
> more ranges, like {{[minStamp, maxStamp)}}, and {{[minStamp, MAX)}}
> This is insufficient for all but the simplest use of the 
> {{CheckAndMutateBuilder#timeRange}} function.
> On u...@hbase.apache.org, it was suggested that more factory functions could 
> be added: 
> https://lists.apache.org/thread.html/0ffc5e57c396873d56e49d7b02e823432b053fb98037ee6778d7c2ce@%3Cuser.hbase.apache.org%3E
> However, {{TimeRange}}'s documentation currently says:
> {code:java}
>  * Can be returned and read by clients.  Should not be directly created by 
> clients.
>  * Thus, all constructors are purposely @InterfaceAudience.Private.
> {code}
> so another approach to making {{CheckAndMutateBuilder#timeRange}} useful may 
> be required.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (HBASE-22841) TimeRange's factory functions do not support ranges, only `allTime` and `at`

2019-08-16 Thread Zheng Hu (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-22841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22841:
-
Fix Version/s: 2.1.6
   2.2.1
   2.0.6
   2.3.0
   3.0.0

> TimeRange's factory functions do not support ranges, only `allTime` and `at`
> 
>
> Key: HBASE-22841
> URL: https://issues.apache.org/jira/browse/HBASE-22841
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 2.2.0, 2.1.5
>Reporter: Huon Wilson
>Assignee: Huon Wilson
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.0.6, 2.2.1, 2.1.6
>
>
> The {{org.apache.hadoop.hbase.io.TimeRange}} is used in functions like 
> {{org.apache.hadoop.hbase.client.Table.CheckAndMutateBuilder#timeRange}}.
> The current ways to create a {{TimeRange}} are:
> - factory functions: {{at}} (a single instant), {{allTime}} (all valid 
> timestamps)
> - deprecated and {{@InterfaceAudience.Private}} constructors, which support 
> more ranges, like {{[minStamp, maxStamp)}}, and {{[minStamp, MAX)}}
> This is insufficient for all but the simplest use of the 
> {{CheckAndMutateBuilder#timeRange}} function.
> On u...@hbase.apache.org, it was suggested that more factory functions could 
> be added: 
> https://lists.apache.org/thread.html/0ffc5e57c396873d56e49d7b02e823432b053fb98037ee6778d7c2ce@%3Cuser.hbase.apache.org%3E
> However, {{TimeRange}}'s documentation currently says:
> {code:java}
>  * Can be returned and read by clients.  Should not be directly created by 
> clients.
>  * Thus, all constructors are purposely @InterfaceAudience.Private.
> {code}
> so another approach to making {{CheckAndMutateBuilder#timeRange}} useful may 
> be required.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Assigned] (HBASE-22841) TimeRange's factory functions do not support ranges, only `allTime` and `at`

2019-08-16 Thread Zheng Hu (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-22841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu reassigned HBASE-22841:


Assignee: Huon Wilson

> TimeRange's factory functions do not support ranges, only `allTime` and `at`
> 
>
> Key: HBASE-22841
> URL: https://issues.apache.org/jira/browse/HBASE-22841
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 2.2.0, 2.1.5
>Reporter: Huon Wilson
>Assignee: Huon Wilson
>Priority: Major
>
> The {{org.apache.hadoop.hbase.io.TimeRange}} is used in functions like 
> {{org.apache.hadoop.hbase.client.Table.CheckAndMutateBuilder#timeRange}}.
> The current ways to create a {{TimeRange}} are:
> - factory functions: {{at}} (a single instant), {{allTime}} (all valid 
> timestamps)
> - deprecated and {{@InterfaceAudience.Private}} constructors, which support 
> more ranges, like {{[minStamp, maxStamp)}}, and {{[minStamp, MAX)}}
> This is insufficient for all but the simplest use of the 
> {{CheckAndMutateBuilder#timeRange}} function.
> On u...@hbase.apache.org, it was suggested that more factory functions could 
> be added: 
> https://lists.apache.org/thread.html/0ffc5e57c396873d56e49d7b02e823432b053fb98037ee6778d7c2ce@%3Cuser.hbase.apache.org%3E
> However, {{TimeRange}}'s documentation currently says:
> {code:java}
>  * Can be returned and read by clients.  Should not be directly created by 
> clients.
>  * Thus, all constructors are purposely @InterfaceAudience.Private.
> {code}
> so another approach to making {{CheckAndMutateBuilder#timeRange}} useful may 
> be required.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Assigned] (HBASE-22867) The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table

2019-08-16 Thread Zheng Hu (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-22867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu reassigned HBASE-22867:


Assignee: Zheng Hu

> The ForkJoinPool in CleanerChore will spawn thousands of threads in our 
> cluster with thousands table
> 
>
> Key: HBASE-22867
> URL: https://issues.apache.org/jira/browse/HBASE-22867
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Critical
> Attachments: 31162.stack.1
>
>
> The thousands of spawned  threads make the safepoint cost 80+s in our Master 
> JVM processs.
> {code}
> 2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] 
> org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard 
> from server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket 
> connection and at
> tempting reconnect
> {code}
> The stdout from JVM (can see from here there're 9126 threads & sync cost 80+s)
> {code}
> vmop[threads: total initially_running wait_to_block]
> [time: spin block sync cleanup vmop] page_trap_count
> 32358.859: ForceAsyncSafepoint  [9126 67
> 474]  [ 128 8659687   101]  0
> {code}
> Also we got the jstack: 
> {code}
> $ cat 31162.stack.1  | grep 'ForkJoinPool-1-worker' | wc -l
> 8648
> {code}
> It's a dangerous bug, make it as blocker.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous

2019-08-16 Thread Zheng Hu (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909529#comment-16909529
 ] 

Zheng Hu commented on HBASE-22862:
--

Similar with HBASE-16931 ?  the current one is in flush process, while the 
HBASE-16931 is in compaction process ?

> Region Server crash with: Added a key not lexically larger than previous
> 
>
> Key: HBASE-22862
> URL: https://issues.apache.org/jira/browse/HBASE-22862
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.10
> Environment: {code}
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02)
> OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed 
> mode)
> {code}
>Reporter: Alex Batyrshin
>Priority: Critical
>
> We observe error "Added a key not lexically larger than previous” that cause 
> most of our region-servers to crash in our cluster.
> {code}
> 2019-08-15 18:02:10,554 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 1/1 column families, memstore=56.08 MB
> 2019-08-15 18:02:10,727 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=0
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
>at java.lang.Thread.run(Thread.java:748)
> 2019-08-15 18:02:21,776 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=9
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>

[jira] [Commented] (HBASE-22867) The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table

2019-08-16 Thread Zheng Hu (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-22867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908959#comment-16908959
 ] 

Zheng Hu commented on HBASE-22867:
--

Seems it's not, you can see the threads dump in the attachment. I'm still 
reading the implementation of ForkJoinPool...Or maybe we will plan to replace 
all the ForkJoinPool by thread pool...

> The ForkJoinPool in CleanerChore will spawn thousands of threads in our 
> cluster with thousands table
> 
>
> Key: HBASE-22867
> URL: https://issues.apache.org/jira/browse/HBASE-22867
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Priority: Blocker
> Attachments: 31162.stack.1
>
>
> The thousands of spawned  threads make the safepoint cost 80+s in our Master 
> JVM processs.
> {code}
> 2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] 
> org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard 
> from server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket 
> connection and at
> tempting reconnect
> {code}
> The stdout from JVM (can see from here there're 9126 threads & sync cost 80+s)
> {code}
> vmop[threads: total initially_running wait_to_block]
> [time: spin block sync cleanup vmop] page_trap_count
> 32358.859: ForceAsyncSafepoint  [9126 67
> 474]  [ 128 8659687   101]  0
> {code}
> Also we got the jstack: 
> {code}
> $ cat 31162.stack.1  | grep 'ForkJoinPool-1-worker' | wc -l
> 8648
> {code}
> It's a dangerous bug, make it as blocker.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (HBASE-22867) The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table

2019-08-16 Thread Zheng Hu (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-22867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22867:
-
Description: 
The thousands of spawned  threads make the safepoint cost 80+s in our Master 
JVM processs.
{code}
2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] 
org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from 
server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket connection 
and at
tempting reconnect
{code}

The stdout from JVM (can see from here there're 9126 threads & sync cost 80+s)
{code}
vmop[threads: total initially_running wait_to_block]
[time: spin block sync cleanup vmop] page_trap_count
32358.859: ForceAsyncSafepoint  [9126 67474 
   ]  [ 128 8659687   101]  0
{code}

Also we got the jstack: 
{code}
$ cat 31162.stack.1  | grep 'ForkJoinPool-1-worker' | wc -l
8648
{code}

It's a dangerous bug, make it as blocker.


  was:
The thousands of spawned  threads make the safepoint cost 80+s in our Master 
JVM processs.
{code}
2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] 
org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from 
server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket connection 
and at
tempting reconnect
{code}

The stdout from JVM (can see from here there's 9126 threads & sync cost 80+s)
{code}
vmop[threads: total initially_running wait_to_block]
[time: spin block sync cleanup vmop] page_trap_count
32358.859: ForceAsyncSafepoint  [9126 67474 
   ]  [ 128 8659687   101]  0
{code}

Also we got the jstack: 
{code}
$ cat 31162.stack.1  | grep 'ForkJoinPool-1-worker' | wc -l
8648
{code}



> The ForkJoinPool in CleanerChore will spawn thousands of threads in our 
> cluster with thousands table
> 
>
> Key: HBASE-22867
> URL: https://issues.apache.org/jira/browse/HBASE-22867
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Priority: Blocker
> Attachments: 31162.stack.1
>
>
> The thousands of spawned  threads make the safepoint cost 80+s in our Master 
> JVM processs.
> {code}
> 2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] 
> org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard 
> from server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket 
> connection and at
> tempting reconnect
> {code}
> The stdout from JVM (can see from here there're 9126 threads & sync cost 80+s)
> {code}
> vmop[threads: total initially_running wait_to_block]
> [time: spin block sync cleanup vmop] page_trap_count
> 32358.859: ForceAsyncSafepoint  [9126 67
> 474]  [ 128 8659687   101]  0
> {code}
> Also we got the jstack: 
> {code}
> $ cat 31162.stack.1  | grep 'ForkJoinPool-1-worker' | wc -l
> 8648
> {code}
> It's a dangerous bug, make it as blocker.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (HBASE-22867) The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table

2019-08-16 Thread Zheng Hu (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-22867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22867:
-
Attachment: 31162.stack.1

> The ForkJoinPool in CleanerChore will spawn thousands of threads in our 
> cluster with thousands table
> 
>
> Key: HBASE-22867
> URL: https://issues.apache.org/jira/browse/HBASE-22867
> Project: HBase
>  Issue Type: Bug
>Reporter: Zheng Hu
>Priority: Blocker
> Attachments: 31162.stack.1
>
>
> The thousands of spawned  threads make the safepoint cost 80+s in our Master 
> JVM processs.
> {code}
> 2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] 
> org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard 
> from server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket 
> connection and at
> tempting reconnect
> {code}
> The stdout from JVM (can see from here there's 9126 threads & sync cost 80+s)
> {code}
> vmop[threads: total initially_running wait_to_block]
> [time: spin block sync cleanup vmop] page_trap_count
> 32358.859: ForceAsyncSafepoint  [9126 67
> 474]  [ 128 8659687   101]  0
> {code}
> Also we got the jstack: 
> {code}
> $ cat 31162.stack.1  | grep 'ForkJoinPool-1-worker' | wc -l
> 8648
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Created] (HBASE-22867) The ForkJoinPool in CleanerChore will spawn thousands of threads in our cluster with thousands table

2019-08-16 Thread Zheng Hu (JIRA)

Zheng Hu created HBASE-22867:


 Summary: The ForkJoinPool in CleanerChore will spawn thousands of 
threads in our cluster with thousands table
 Key: HBASE-22867
 URL: https://issues.apache.org/jira/browse/HBASE-22867
 Project: HBase
  Issue Type: Bug
Reporter: Zheng Hu


The thousands of spawned  threads make the safepoint cost 80+s in our Master 
JVM processs.
{code}
2019-08-15,19:35:35,861 INFO [main-SendThread(zjy-hadoop-prc-zk02.bj:11000)] 
org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from 
server in 82260ms for sessionid 0x1691332e2d3aae5, closing socket connection 
and at
tempting reconnect
{code}

The stdout from JVM (can see from here there's 9126 threads & sync cost 80+s)
{code}
vmop[threads: total initially_running wait_to_block]
[time: spin block sync cleanup vmop] page_trap_count
32358.859: ForceAsyncSafepoint  [9126 67474 
   ]  [ 128 8659687   101]  0
{code}

Also we got the jstack: 
{code}
$ cat 31162.stack.1  | grep 'ForkJoinPool-1-worker' | wc -l
8648
{code}




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Updated] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous

2019-08-15 Thread Zheng Hu (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu updated HBASE-22862:
-
Priority: Critical  (was: Major)

> Region Server crash with: Added a key not lexically larger than previous
> 
>
> Key: HBASE-22862
> URL: https://issues.apache.org/jira/browse/HBASE-22862
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.10
> Environment: {code}
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02)
> OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed 
> mode)
> {code}
>Reporter: Alex Batyrshin
>Priority: Critical
>
> We observer error "Added a key not lexically larger than previous” that cause 
> most of our region-servers to crash in our cluster.
> {code}
> 2019-08-15 18:02:10,554 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 1/1 column families, memstore=56.08 MB
> 2019-08-15 18:02:10,727 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=0
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
>at java.lang.Thread.run(Thread.java:748)
> 2019-08-15 18:02:21,776 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=9
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
>

[jira] [Commented] (HBASE-22862) Region Server crash with: Added a key not lexically larger than previous

2019-08-15 Thread Zheng Hu (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-22862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908699#comment-16908699
 ] 

Zheng Hu commented on HBASE-22862:
--

Which HBase version are you using ?   I think it should be a critical bug ...

> Region Server crash with: Added a key not lexically larger than previous
> 
>
> Key: HBASE-22862
> URL: https://issues.apache.org/jira/browse/HBASE-22862
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 1.4.10
> Environment: {code}
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (Zulu 8.31.0.1-linux64) (build 1.8.0_181-b02)
> OpenJDK 64-Bit Server VM (Zulu 8.31.0.1-linux64) (build 25.181-b02, mixed 
> mode)
> {code}
>Reporter: Alex Batyrshin
>Priority: Major
>
> We observer error "Added a key not lexically larger than previous” that cause 
> most of our region-servers to crash in our cluster.
> {code}
> 2019-08-15 18:02:10,554 INFO  [MemStoreFlusher.0] regionserver.HRegion: 
> Flushing 1/1 column families, memstore=56.08 MB
> 2019-08-15 18:02:10,727 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=0
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2125)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:512)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:482)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76)
>at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264)
>at java.lang.Thread.run(Thread.java:748)
> 2019-08-15 18:02:21,776 WARN  [MemStoreFlusher.0] regionserver.HStore: Failed 
> flushing store file, retrying num=9
> java.io.IOException: Added a key not lexically larger than previous. Current 
> cell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/DeleteColumn/vlen=0/seqid=44456567,
>  lastCell = 
> \x0901820448218>wGavb'/d:elr/1565881054828/Put/vlen=1/seqid=44457770
>at 
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.checkKey(AbstractHFileWriter.java:204)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:279)
>at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:1127)
>at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:139)
>at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
>at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1003)
>at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2523)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2622)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2352)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2314)
>at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2200)
>at 
>

[jira] [Resolved] (HBASE-22833) MultiRowRangeFilter should provide a method for creating a filter which is functionally equivalent to multiple prefix filters

2019-08-15 Thread Zheng Hu (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-22833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Hu resolved HBASE-22833.
--
  Resolution: Fixed
Hadoop Flags: Reviewed
Release Note: 
Provide a public method in MultiRowRangeFilter class to speed the requirement 
of filtering with multiple row prefixes, it will expand the row prefixes as 
multiple rowkey ranges by MultiRowRangeFilter, it's more efficient.
{code}
public MultiRowRangeFilter(byte[][] rowKeyPrefixes);
{code}

> MultiRowRangeFilter should provide a method for creating a filter which is 
> functionally equivalent to multiple prefix filters
> -
>
> Key: HBASE-22833
> URL: https://issues.apache.org/jira/browse/HBASE-22833
> Project: HBase
>  Issue Type: Wish
>  Components: Client
>Affects Versions: 3.0.0
>Reporter: Itsuki Toyota
>Assignee: Itsuki Toyota
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0, 2.0.6, 2.2.1, 2.1.6, 1.4.11
>
>
> HI,
> I think current formal way to make multiple prefix filters is to create a 
> _FilterList_ and add _PrefixFilter_ instances to the list:
> {code:java}
> FilterList allFilters = new FilterList(FilterList.Operator.MUST_PASS_ONE);
> allFilters.addFilter(new PrefixFilter(Bytes.toBytes("123")));
> allFilters.addFilter(new PrefixFilter(Bytes.toBytes("456")));
> allFilters.addFilter(new PrefixFilter(Bytes.toBytes("678")));
> scan.setFilter(allFilters);
> {code}
> (c.f., 
> https://stackoverflow.com/questions/41074213/hbase-how-to-specify-multiple-prefix-filters-in-a-single-scan-operation
>  )
> However, in the case of creating a single prefix filter, HBase provides 
> _scan.setRowPrefixFilter_ method.
> This method creates a range filter by setting a start row and a stop row.
> The value of a stop row is decided by calling 
> _calculateTheClosestNextRowKeyForPrefix_ ( c.f., 
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Scan.java#L574-L597
>  )
> _MultiRowRangeFilter_ could leverage a list of start row and stop row pairs 
> and _calculateTheClosestNextRowKeyForPrefix_ could compute the stop row value 
> corresponding to given start row (i.e., a prefix).
> I think this kind of filter (a filter which is functionally equivalent to 
> multiple prefix filters) should be creatable by _MultiRowRangeFilter_ and 
> it's better than the current formal way.
> Cheers,



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2394 matches

Mail list logo