Those log lines have settled down, they may have been related to a cluster-wide forced restart at the time.
On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault <bbeaudrea...@hubspot.com> wrote: > We've been doing more debugging of this and have set up the read vs write > handlers to try to at least segment this away so reads can work. We have > pretty beefy servers, and are running wiht the following settings: > > hbase.regionserver.handler.count=1000 > hbase.ipc.server.read.threadpool.size=50 > hbase.ipc.server.callqueue.handler.factor=0.025 > hbase.ipc.server.callqueue.read.ratio=0.6 > hbase.ipc.server.callqueue.scan.ratio=0.5 > > We are seeing all 400 write handlers taken up by row locks for the most > part. The read handlers are mostly idle. We're thinking of changing the > ratio here, but are not sure it will help if they are all blocked on a row > lock. We just enabled DEBUG logging on all our servers and notice the > following: > > 2015-12-01 00:56:09,015 DEBUG > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected > by nonce: [-687451119961178644:7664336281906118656], [state 0, hasWait > false, activity 00:54:36.240] > 2015-12-01 00:56:09,015 DEBUG > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected > by nonce: [-687451119961178644:-7119840249342174227], [state 0, hasWait > false, activity 00:54:36.256] > 2015-12-01 00:56:09,268 DEBUG > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected > by nonce: [-5946137511131403479:2112661701888365489], [state 0, hasWait > false, activity 00:55:01.259] > 2015-12-01 00:56:09,279 DEBUG > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected > by nonce: [4165332617675853029:6256955295384472057], [state 0, hasWait > false, activity 00:53:58.151] > 2015-12-01 00:56:09,279 DEBUG > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict detected > by nonce: [4165332617675853029:4961178013070912522], [state 0, hasWait > false, activity 00:53:58.162] > > > On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault < > bbeaudrea...@hubspot.com> wrote: > >> Sorry the second link should be >> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579 >> >> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault < >> bbeaudrea...@hubspot.com> wrote: >> >>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085 >>> >>> An active handler: >>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286 >>> One that is locked: >>> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579 >>> >>> The difference between pre-rollback and post is that previously we were >>> seeing things blocked in mvcc. Now we are seeing them blocked on the >>> upsert. >>> >>> It always follows the same pattern, of 1 active handler in the upsert >>> and the rest blocked waiting for it. >>> >>> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote: >>> >>>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault < >>>> bbeaudrea...@hubspot.com >>>> > wrote: >>>> >>>> > The rollback seems to have mostly solved the issue for one of our >>>> clusters, >>>> > but another one is still seeing long increment times: >>>> > >>>> > "slowIncrementCount": 52080, >>>> > "Increment_num_ops": 325236,"Increment_min": 1,"Increment_max": 6162," >>>> > Increment_mean": 465.68678129112396,"Increment_median": 216," >>>> > Increment_75th_percentile": 450.25,"Increment_95th_percentile": >>>> > 1052.6499999999999,"Increment_99th_percentile": 1635.2399999999998 >>>> > >>>> > >>>> > Any ideas if there are other changes that may be causing a performance >>>> > regression for increments between CDH4.7.1 and CDH5.3.8? >>>> > >>>> > >>>> > >>>> No. >>>> >>>> Post a thread dump Bryan and it might prompt something. >>>> >>>> St.Ack >>>> >>>> >>>> >>>> >>>> > >>>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net> wrote: >>>> > >>>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault < >>>> > > bbeaudrea...@hubspot.com> wrote: >>>> > > >>>> > > > Should this be added as a known issue in the CDH or hbase >>>> > documentation? >>>> > > It >>>> > > > was a severe performance hit for us, all of our regionservers were >>>> > > sitting >>>> > > > at a few thousand queued requests. >>>> > > > >>>> > > > >>>> > > Let me take care of that. >>>> > > St.Ack >>>> > > >>>> > > >>>> > > >>>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault < >>>> > > > bbeaudrea...@hubspot.com> >>>> > > > wrote: >>>> > > > >>>> > > > > Yea, they are all over the place and called from client and >>>> > coprocessor >>>> > > > > code. We ended up having no other option but to rollback, and >>>> aside >>>> > > from >>>> > > > a >>>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs >>>> Put#addColumn), >>>> > > it >>>> > > > > seems to be working and fixing our problem. >>>> > > > > >>>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net> wrote: >>>> > > > > >>>> > > > >> Rollback is untested. No fix in 5.5. I was going to work on >>>> this >>>> > now. >>>> > > > >> Where >>>> > > > >> are your counters Bryan? In their own column family or >>>> scattered >>>> > about >>>> > > > in >>>> > > > >> a >>>> > > > >> row with other Cell types? >>>> > > > >> St.Ack >>>> > > > >> >>>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault < >>>> > > > >> bbeaudrea...@hubspot.com> wrote: >>>> > > > >> >>>> > > > >> > Is there any update to this? We just upgraded all of our >>>> > production >>>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this JIRA >>>> listed in >>>> > > the >>>> > > > >> > known issues, did not not about this. Now we are seeing >>>> > perfomance >>>> > > > >> issues >>>> > > > >> > across all clusters, as we make heavy use of increments. >>>> > > > >> > >>>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only hope to >>>> roll >>>> > > back >>>> > > > >> to >>>> > > > >> > CDH 5.3.1 (if that is possible)? >>>> > > > >> > >>>> > > > >> > >>>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <brfrn...@gmail.com> >>>> wrote: >>>> > > > >> > >>>> > > > >> > > Thank you St.Ack! >>>> > > > >> > > >>>> > > > >> > > I would like to follow the ticket. >>>> > > > >> > > >>>> > > > >> > > Toshihiro Suzuki >>>> > > > >> > > >>>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net>: >>>> > > > >> > > >>>> > > > >> > > > Back to this problem. Simple tests confirm that as is, >>>> the >>>> > > > >> > > > single-queue-backed MVCC instance can slow Region ops if >>>> some >>>> > > > other >>>> > > > >> row >>>> > > > >> > > is >>>> > > > >> > > > slow to complete. In particular Increment, checkAndPut, >>>> and >>>> > > batch >>>> > > > >> > > mutations >>>> > > > >> > > > are effected. I opened HBASE-14460 to start in on a fix >>>> up. >>>> > Lets >>>> > > > >> see if >>>> > > > >> > > we >>>> > > > >> > > > can somehow scope mvcc to row or at least shard mvcc so >>>> not >>>> > all >>>> > > > >> Region >>>> > > > >> > > ops >>>> > > > >> > > > are paused. >>>> > > > >> > > > >>>> > > > >> > > > St.Ack >>>> > > > >> > > > >>>> > > > >> > > > >>>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 < >>>> brfrn...@gmail.com> >>>> > > wrote: >>>> > > > >> > > > >>>> > > > >> > > > > > Thank you for the below reasoning (with accompanying >>>> > helpful >>>> > > > >> > > diagram). >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help with >>>> the >>>> > > > >> > > illustration. >>>> > > > >> > > > It >>>> > > > >> > > > > > is as though the mvcc should be scoped to a row >>>> only... >>>> > > Writes >>>> > > > >> > > against >>>> > > > >> > > > > > other rows should not hold up my read of my row. Tag >>>> an >>>> > mvcc >>>> > > > >> with a >>>> > > > >> > > > 'row' >>>> > > > >> > > > > > scope so we can see which on-going writes pertain to >>>> > current >>>> > > > >> > > operation? >>>> > > > >> > > > > Thank you St.Ack! I think this approach would work. >>>> > > > >> > > > > >>>> > > > >> > > > > > You need to read back the increment and have it be >>>> > 'correct' >>>> > > > at >>>> > > > >> > > > increment >>>> > > > >> > > > > > time? >>>> > > > >> > > > > Yes, we need it. >>>> > > > >> > > > > >>>> > > > >> > > > > I would like to help if there is anything I can do. >>>> > > > >> > > > > >>>> > > > >> > > > > Thanks, >>>> > > > >> > > > > Toshihiro Suzuki >>>> > > > >> > > > > >>>> > > > >> > > > > >>>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <st...@duboce.net>: >>>> > > > >> > > > > >>>> > > > >> > > > > > Thank you for the below reasoning (with accompanying >>>> > helpful >>>> > > > >> > > diagram). >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to help with >>>> the >>>> > > > >> > > illustration. >>>> > > > >> > > > It >>>> > > > >> > > > > > is as though the mvcc should be scoped to a row >>>> only... >>>> > > Writes >>>> > > > >> > > against >>>> > > > >> > > > > > other rows should not hold up my read of my row. Tag >>>> an >>>> > mvcc >>>> > > > >> with a >>>> > > > >> > > > 'row' >>>> > > > >> > > > > > scope so we can see which on-going writes pertain to >>>> > current >>>> > > > >> > > operation? >>>> > > > >> > > > > > >>>> > > > >> > > > > > You need to read back the increment and have it be >>>> > 'correct' >>>> > > > at >>>> > > > >> > > > increment >>>> > > > >> > > > > > time? >>>> > > > >> > > > > > >>>> > > > >> > > > > > (This is a good one) >>>> > > > >> > > > > > >>>> > > > >> > > > > > Thank you Toshihiro Suzuki >>>> > > > >> > > > > > St.Ack >>>> > > > >> > > > > > >>>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 < >>>> brfrn...@gmail.com >>>> > > >>>> > > > >> wrote: >>>> > > > >> > > > > > >>>> > > > >> > > > > > > St.Ack, >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > Thank you for your response. >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > Why I make out that "A region lock (not a row lock) >>>> > seems >>>> > > to >>>> > > > >> > occur >>>> > > > >> > > in >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is as >>>> follows: >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > A increment operation has 3 procedures for MVCC. >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > 1. mvcc.waitForPreviousTransactionsComplete(); >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712 >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > 2. w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum); >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721 >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w, >>>> walKey); >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893 >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > I think that MultiVersionConsistencyControl's >>>> writeQueue >>>> > > can >>>> > > > >> > cause >>>> > > > >> > > a >>>> > > > >> > > > > > region >>>> > > > >> > > > > > > lock. >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43 >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue. >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108 >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > Step 3 removes the WriteEntry from writeQueue. >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey) -> >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) -> >>>> > > advanceMemstore(w) >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127 >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235 >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160 >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > Step 1 adds a WriteEntry w in >>>> beginMemstoreInsert() to >>>> > > > >> writeQueue >>>> > > > >> > > and >>>> > > > >> > > > > > waits >>>> > > > >> > > > > > > until writeQueue is empty or writeQueue.getFirst() >>>> == w. >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204 >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241 >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > I think when a handler thread is processing between >>>> > step 2 >>>> > > > and >>>> > > > >> > step >>>> > > > >> > > > 3, >>>> > > > >> > > > > > the >>>> > > > >> > > > > > > other handler threads can wait at step 1 until the >>>> > thread >>>> > > > >> > completes >>>> > > > >> > > > > step >>>> > > > >> > > > > > 3 >>>> > > > >> > > > > > > This is depicted as follows: >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > Actually, in the thread dump of our region server, >>>> many >>>> > > > >> handler >>>> > > > >> > > > threads >>>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at Step 1 >>>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()). >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > Many handler threads wait at this: >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224 >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > > Is it possible you are contending on a counter >>>> > > > post-upgrade? >>>> > > > >> > Is >>>> > > > >> > > it >>>> > > > >> > > > > > > > possible that all these threads are trying to >>>> get to >>>> > the >>>> > > > >> same >>>> > > > >> > row >>>> > > > >> > > > to >>>> > > > >> > > > > > > update >>>> > > > >> > > > > > > > it? Could the app behavior have changed? Or are >>>> you >>>> > > > >> thinking >>>> > > > >> > > > > increment >>>> > > > >> > > > > > > > itself has slowed significantly? >>>> > > > >> > > > > > > We have just upgraded HBase, not changed the app >>>> > behavior. >>>> > > > We >>>> > > > >> are >>>> > > > >> > > > > > thinking >>>> > > > >> > > > > > > increment itself has slowed significantly. >>>> > > > >> > > > > > > Before upgrading HBase, it was good throughput and >>>> > > latency. >>>> > > > >> > > > > > > Currently, to cope with this problem, we split the >>>> > regions >>>> > > > >> > finely. >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > Thanks, >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > Toshihiro Suzuki >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <st...@duboce.net >>>> >: >>>> > > > >> > > > > > > >>>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 < >>>> > > brfrn...@gmail.com >>>> > > > > >>>> > > > >> > > wrote: >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > > > Ted, >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > > Thank you for your response. >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > > I uploaded the complete stack trace to Gist. >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > > >>>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932 >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > > I think that increment operation works as >>>> follows: >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > > 1. get row lock >>>> > > > >> > > > > > > > > 2. mvcc.waitForPreviousTransactionsComplete() >>>> // >>>> > wait >>>> > > > for >>>> > > > >> all >>>> > > > >> > > > prior >>>> > > > >> > > > > > > MVCC >>>> > > > >> > > > > > > > > transactions to finish >>>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum() // >>>> start a >>>> > > > >> > transaction >>>> > > > >> > > > > > > > > 4. get previous values >>>> > > > >> > > > > > > > > 5. create KVs >>>> > > > >> > > > > > > > > 6. write to Memstore >>>> > > > >> > > > > > > > > 7. write to WAL >>>> > > > >> > > > > > > > > 8. release row lock >>>> > > > >> > > > > > > > > 9. mvcc.completeMemstoreInsertWithSeqNum() // >>>> > complete >>>> > > > the >>>> > > > >> > > > > > transaction >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > > A instance of MultiVersionConsistencyControl >>>> has a >>>> > > > pending >>>> > > > >> > > queue >>>> > > > >> > > > of >>>> > > > >> > > > > > > > writes >>>> > > > >> > > > > > > > > named writeQueue. >>>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to writeQueue and >>>> waits >>>> > > until >>>> > > > >> > > > writeQueue >>>> > > > >> > > > > > is >>>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w. >>>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue and >>>> step 9 >>>> > > > removes >>>> > > > >> the >>>> > > > >> > > > > > > WriteEntry >>>> > > > >> > > > > > > > > from writeQueue. >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > > I think that when a handler thread is >>>> processing >>>> > > between >>>> > > > >> > step 2 >>>> > > > >> > > > and >>>> > > > >> > > > > > > step >>>> > > > >> > > > > > > > 9, >>>> > > > >> > > > > > > > > the other handler threads can wait until the >>>> thread >>>> > > > >> completes >>>> > > > >> > > > step >>>> > > > >> > > > > 9. >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > That is right. We need to read, after all >>>> outstanding >>>> > > > >> updates >>>> > > > >> > are >>>> > > > >> > > > > > done... >>>> > > > >> > > > > > > > because we need to read the latest update before >>>> we go >>>> > > to >>>> > > > >> > > > > > > modify/increment >>>> > > > >> > > > > > > > it. >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > > How do you make out this? >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > > "A region lock (not a row lock) seems to occur in >>>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()." >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > > In 0.98.x we did this: >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > > >>>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert()); >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > > ... and in 1.0 we do this: >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > > mvcc.waitForPreviousTransactionsComplete() which >>>> is >>>> > > > this.... >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > > + public void >>>> waitForPreviousTransactionsComplete() { >>>> > > > >> > > > > > > > + WriteEntry w = beginMemstoreInsert(); >>>> > > > >> > > > > > > > + waitForPreviousTransactionsComplete(w); >>>> > > > >> > > > > > > > + } >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > > The mvcc and region sequenceid were merged in >>>> 1.0 ( >>>> > > > >> > > > > > > > https://issues.apache.org/jira/browse/HBASE-8763 >>>> ). >>>> > > > Previous >>>> > > > >> > mvcc >>>> > > > >> > > > and >>>> > > > >> > > > > > > > region >>>> > > > >> > > > > > > > sequenceid would spin independent of each other. >>>> > Perhaps >>>> > > > >> this >>>> > > > >> > > > > > responsible >>>> > > > >> > > > > > > > for some slow down. >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > > That said, looking in your thread dump, we seem >>>> to be >>>> > > down >>>> > > > >> in >>>> > > > >> > the >>>> > > > >> > > > > Get. >>>> > > > >> > > > > > If >>>> > > > >> > > > > > > > you do a bunch of thread dumps in a row, where >>>> is the >>>> > > > >> > > lock-holding >>>> > > > >> > > > > > > thread? >>>> > > > >> > > > > > > > In Get or writing Increment... or waiting on >>>> sequence >>>> > > id? >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > > Is it possible you are contending on a counter >>>> > > > post-upgrade? >>>> > > > >> > Is >>>> > > > >> > > it >>>> > > > >> > > > > > > > possible that all these threads are trying to >>>> get to >>>> > the >>>> > > > >> same >>>> > > > >> > row >>>> > > > >> > > > to >>>> > > > >> > > > > > > update >>>> > > > >> > > > > > > > it? Could the app behavior have changed? Or are >>>> you >>>> > > > >> thinking >>>> > > > >> > > > > increment >>>> > > > >> > > > > > > > itself has slowed significantly? >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > > St.Ack >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > > > Thanks, >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > > Toshihiro Suzuki >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu < >>>> > yuzhih...@gmail.com >>>> > > >: >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the row (not >>>> > > region): >>>> > > > >> > > > > > > > > > >>>> > > > >> > > > > > > > > > try { >>>> > > > >> > > > > > > > > > rowLock = getRowLock(row); >>>> > > > >> > > > > > > > > > >>>> > > > >> > > > > > > > > > Can you pastebin the complete stack trace ? >>>> > > > >> > > > > > > > > > >>>> > > > >> > > > > > > > > > Thanks >>>> > > > >> > > > > > > > > > >>>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕 < >>>> > > > >> brfrn...@gmail.com> >>>> > > > >> > > > wrote: >>>> > > > >> > > > > > > > > > >>>> > > > >> > > > > > > > > > > Hi, >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > We upgraded our cluster from >>>> > CDH5.3.1(HBase0.98.6) >>>> > > > to >>>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0) >>>> > > > >> > > > > > > > > > > and we experience slowdown in increment >>>> > operation. >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > Here's an extract from thread dump of the >>>> > > > >> RegionServer of >>>> > > > >> > > our >>>> > > > >> > > > > > > > cluster: >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > Thread 68 >>>> > > > >> > > > > > > >>>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020): >>>> > > > >> > > > > > > > > > > State: BLOCKED >>>> > > > >> > > > > > > > > > > Blocked count: 21689888 >>>> > > > >> > > > > > > > > > > Waited count: 39828360 >>>> > > > >> > > > > > > > > > > Blocked on java.util.LinkedList@3474e4b2 >>>> > > > >> > > > > > > > > > > Blocked by 63 >>>> > > > >> > > > > > > > > >>>> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020) >>>> > > > >> > > > > > > > > > > Stack: >>>> > > > >> > > > > > > > > > > java.lang.Object.wait(Native Method) >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224) >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203) >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > >>>> > > > >> > > >>>> > > > >> >>>> > > > >>>> > >>>> org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712) >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501) >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570) >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901) >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451) >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > >>>> > > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035) >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > >>>> > > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107) >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > >>>> > > > >> > > > >>>> > > > >>>> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) >>>> > > > >> > > > > > > > > > > java.lang.Thread.run(Thread.java:745) >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > There are many similar threads in the >>>> thread >>>> > dump. >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > I read the source code and I think this is >>>> > caused >>>> > > by >>>> > > > >> > > changes >>>> > > > >> > > > of >>>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl. >>>> > > > >> > > > > > > > > > > A region lock (not a row lock) seems to >>>> occur in >>>> > > > >> > > > > > > > > > > waitForPreviousTransactionsComplete(). >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > Also we wrote performance test code for >>>> > increment >>>> > > > >> > operation >>>> > > > >> > > > > that >>>> > > > >> > > > > > > > > included >>>> > > > >> > > > > > > > > > > 100 threads and ran it in local mode. >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > The result is shown below: >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6) >>>> > > > >> > > > > > > > > > > Throughput(op/s): 12757, Latency(ms): >>>> > > > >> 7.975072509210629 >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0) >>>> > > > >> > > > > > > > > > > Throughput(op/s): 2027, Latency(ms): >>>> > > > 49.11840157868772 >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > Thanks, >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > > Toshihiro Suzuki >>>> > > > >> > > > > > > > > > > >>>> > > > >> > > > > > > > > > >>>> > > > >> > > > > > > > > >>>> > > > >> > > > > > > > >>>> > > > >> > > > > > > >>>> > > > >> > > > > > >>>> > > > >> > > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > > >>>> > > > >>>> > > >>>> > >>>> >>>