I'll try to get another one.  We are currently not seeing the issue due to
lack of contention (it is off hours for our customers).

Note that the stack trace I gave you was taken with a tool we have which
aggregates common stacks. The one at the bottom occurred 122 times (out of
128 handlers -- this is pre-tuning when we added 1000 handlers and the read
vs write).  So to me it looks like 122 of 128 handlers were waiting on:

if (!existingContext.latch.await(this.rowLockWaitDuration,
TimeUnit.MILLISECONDS)) {
  throw new IOException("Timed out waiting for lock for row: " + rowKey);
}


On Tue, Dec 1, 2015 at 12:08 AM Stack <st...@duboce.net> wrote:

> Looking at that stack trace, nothing showing as blocked or slowed by
> another operation. You have others I could look at Bryan?
> St.Ack
>
> On Mon, Nov 30, 2015 at 8:40 PM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com
> > wrote:
>
> > Yea sorry if I was misleading.  The nonce loglines we saw only happened
> on
> > full cluster restart, it may have been the HLog's replaying, not sure.
> >
> > We are still seeing slow Increments. Where Gets and Mutates will be on
> the
> > order of 50-150ms according to metrics, Increment will be in the
> > 1000-5000ms range. It seems we may be blocking on FSHLog#syncer.
> >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L359
> >
> >
> >
> > On Mon, Nov 30, 2015 at 11:26 PM Stack <st...@duboce.net> wrote:
> >
> > > Still slow increments though?
> > >
> > > On Mon, Nov 30, 2015 at 5:05 PM, Bryan Beaudreault <
> > > bbeaudrea...@hubspot.com
> > > > wrote:
> > >
> > > > Those log lines have settled down, they may have been related to a
> > > > cluster-wide forced restart at the time.
> > > >
> > > > On Mon, Nov 30, 2015 at 7:59 PM Bryan Beaudreault <
> > > > bbeaudrea...@hubspot.com>
> > > > wrote:
> > > >
> > > > > We've been doing more debugging of this and have set up the read vs
> > > write
> > > > > handlers to try to at least segment this away so reads can work. We
> > > have
> > > > > pretty beefy servers, and are running wiht the following settings:
> > > > >
> > > > > hbase.regionserver.handler.count=1000
> > > > > hbase.ipc.server.read.threadpool.size=50
> > > > > hbase.ipc.server.callqueue.handler.factor=0.025
> > > > > hbase.ipc.server.callqueue.read.ratio=0.6
> > > > > hbase.ipc.server.callqueue.scan.ratio=0.5
> > > > >
> > > > > We are seeing all 400 write handlers taken up by row locks for the
> > most
> > > > > part. The read handlers are mostly idle. We're thinking of changing
> > the
> > > > > ratio here, but are not sure it will help if they are all blocked
> on
> > a
> > > > row
> > > > > lock.  We just enabled DEBUG logging on all our servers and notice
> > the
> > > > > following:
> > > > >
> > > > > 2015-12-01 00:56:09,015 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [-687451119961178644:7664336281906118656], [state 0,
> > hasWait
> > > > > false, activity 00:54:36.240]
> > > > > 2015-12-01 00:56:09,015 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [-687451119961178644:-7119840249342174227], [state 0,
> > hasWait
> > > > > false, activity 00:54:36.256]
> > > > > 2015-12-01 00:56:09,268 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [-5946137511131403479:2112661701888365489], [state 0,
> > hasWait
> > > > > false, activity 00:55:01.259]
> > > > > 2015-12-01 00:56:09,279 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [4165332617675853029:6256955295384472057], [state 0,
> > hasWait
> > > > > false, activity 00:53:58.151]
> > > > > 2015-12-01 00:56:09,279 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.ServerNonceManager: Conflict
> > > > detected
> > > > > by nonce: [4165332617675853029:4961178013070912522], [state 0,
> > hasWait
> > > > > false, activity 00:53:58.162]
> > > > >
> > > > >
> > > > > On Mon, Nov 30, 2015 at 6:11 PM Bryan Beaudreault <
> > > > > bbeaudrea...@hubspot.com> wrote:
> > > > >
> > > > >> Sorry the second link should be
> > > > >>
> > > >
> > >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L579
> > > > >>
> > > > >> On Mon, Nov 30, 2015 at 6:10 PM Bryan Beaudreault <
> > > > >> bbeaudrea...@hubspot.com> wrote:
> > > > >>
> > > > >>> https://gist.github.com/bbeaudreault/2994a748da83d9f75085
> > > > >>>
> > > > >>> An active handler:
> > > > >>>
> > > >
> > >
> >
> https://gist.github.com/bbeaudreault/2994a748da83d9f75085#file-gistfile1-txt-L286
> > > > >>> One that is locked:
> > > > >>>
> > > >
> > >
> >
> https://git.hubteam.com/gist/jwilliams/80f37999bfdf55119588#file-gistfile1-txt-L579
> > > > >>>
> > > > >>> The difference between pre-rollback and post is that previously
> we
> > > were
> > > > >>> seeing things blocked in mvcc.  Now we are seeing them blocked on
> > the
> > > > >>> upsert.
> > > > >>>
> > > > >>> It always follows the same pattern, of 1 active handler in the
> > upsert
> > > > >>> and the rest blocked waiting for it.
> > > > >>>
> > > > >>> On Mon, Nov 30, 2015 at 6:05 PM Stack <st...@duboce.net> wrote:
> > > > >>>
> > > > >>>> On Mon, Nov 30, 2015 at 2:31 PM, Bryan Beaudreault <
> > > > >>>> bbeaudrea...@hubspot.com
> > > > >>>> > wrote:
> > > > >>>>
> > > > >>>> > The rollback seems to have mostly solved the issue for one of
> > our
> > > > >>>> clusters,
> > > > >>>> > but another one is still seeing long increment times:
> > > > >>>> >
> > > > >>>> > "slowIncrementCount": 52080,
> > > > >>>> > "Increment_num_ops": 325236,"Increment_min":
> 1,"Increment_max":
> > > > 6162,"
> > > > >>>> > Increment_mean": 465.68678129112396,"Increment_median": 216,"
> > > > >>>> > Increment_75th_percentile":
> 450.25,"Increment_95th_percentile":
> > > > >>>> > 1052.6499999999999,"Increment_99th_percentile":
> > 1635.2399999999998
> > > > >>>> >
> > > > >>>> >
> > > > >>>> > Any ideas if there are other changes that may be causing a
> > > > performance
> > > > >>>> > regression for increments between CDH4.7.1 and CDH5.3.8?
> > > > >>>> >
> > > > >>>> >
> > > > >>>> >
> > > > >>>> No.
> > > > >>>>
> > > > >>>> Post a thread dump Bryan and it might prompt something.
> > > > >>>>
> > > > >>>> St.Ack
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> >
> > > > >>>> > On Mon, Nov 30, 2015 at 4:13 PM Stack <st...@duboce.net>
> wrote:
> > > > >>>> >
> > > > >>>> > > On Mon, Nov 30, 2015 at 12:54 PM, Bryan Beaudreault <
> > > > >>>> > > bbeaudrea...@hubspot.com> wrote:
> > > > >>>> > >
> > > > >>>> > > > Should this be added as a known issue in the CDH or hbase
> > > > >>>> > documentation?
> > > > >>>> > > It
> > > > >>>> > > > was a severe performance hit for us, all of our
> > regionservers
> > > > were
> > > > >>>> > > sitting
> > > > >>>> > > > at a few thousand queued requests.
> > > > >>>> > > >
> > > > >>>> > > >
> > > > >>>> > > Let me take care of that.
> > > > >>>> > > St.Ack
> > > > >>>> > >
> > > > >>>> > >
> > > > >>>> > >
> > > > >>>> > > > On Mon, Nov 30, 2015 at 3:53 PM Bryan Beaudreault <
> > > > >>>> > > > bbeaudrea...@hubspot.com>
> > > > >>>> > > > wrote:
> > > > >>>> > > >
> > > > >>>> > > > > Yea, they are all over the place and called from client
> > and
> > > > >>>> > coprocessor
> > > > >>>> > > > > code. We ended up having no other option but to
> rollback,
> > > and
> > > > >>>> aside
> > > > >>>> > > from
> > > > >>>> > > > a
> > > > >>>> > > > > few NoSuchMethodErrors due to API changes (Put#add vs
> > > > >>>> Put#addColumn),
> > > > >>>> > > it
> > > > >>>> > > > > seems to be working and fixing our problem.
> > > > >>>> > > > >
> > > > >>>> > > > > On Mon, Nov 30, 2015 at 3:47 PM Stack <st...@duboce.net
> >
> > > > wrote:
> > > > >>>> > > > >
> > > > >>>> > > > >> Rollback is untested. No fix in 5.5. I was going to
> work
> > on
> > > > >>>> this
> > > > >>>> > now.
> > > > >>>> > > > >> Where
> > > > >>>> > > > >> are your counters Bryan? In their own column family or
> > > > >>>> scattered
> > > > >>>> > about
> > > > >>>> > > > in
> > > > >>>> > > > >> a
> > > > >>>> > > > >> row with other Cell types?
> > > > >>>> > > > >> St.Ack
> > > > >>>> > > > >>
> > > > >>>> > > > >> On Mon, Nov 30, 2015 at 10:28 AM, Bryan Beaudreault <
> > > > >>>> > > > >> bbeaudrea...@hubspot.com> wrote:
> > > > >>>> > > > >>
> > > > >>>> > > > >> > Is there any update to this? We just upgraded all of
> > our
> > > > >>>> > production
> > > > >>>> > > > >> > clusters from CDH4 to CDH5.4.7 and, not seeing this
> > JIRA
> > > > >>>> listed in
> > > > >>>> > > the
> > > > >>>> > > > >> > known issues, did not not about this.  Now we are
> > seeing
> > > > >>>> > perfomance
> > > > >>>> > > > >> issues
> > > > >>>> > > > >> > across all clusters, as we make heavy use of
> > increments.
> > > > >>>> > > > >> >
> > > > >>>> > > > >> > Can we roll forward to CDH5.5 to fix? Or is our only
> > hope
> > > > to
> > > > >>>> roll
> > > > >>>> > > back
> > > > >>>> > > > >> to
> > > > >>>> > > > >> > CDH 5.3.1 (if that is possible)?
> > > > >>>> > > > >> >
> > > > >>>> > > > >> >
> > > > >>>> > > > >> > On Thu, Sep 24, 2015 at 5:06 AM 鈴木俊裕 <
> > brfrn...@gmail.com
> > > >
> > > > >>>> wrote:
> > > > >>>> > > > >> >
> > > > >>>> > > > >> > > Thank you St.Ack!
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> > > I would like to follow the ticket.
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> > > Toshihiro Suzuki
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> > > 2015-09-22 14:14 GMT+09:00 Stack <st...@duboce.net
> >:
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> > > > Back to this problem. Simple tests confirm that
> as
> > > is,
> > > > >>>> the
> > > > >>>> > > > >> > > > single-queue-backed MVCC instance can slow Region
> > ops
> > > > if
> > > > >>>> some
> > > > >>>> > > > other
> > > > >>>> > > > >> row
> > > > >>>> > > > >> > > is
> > > > >>>> > > > >> > > > slow to complete. In particular Increment,
> > > checkAndPut,
> > > > >>>> and
> > > > >>>> > > batch
> > > > >>>> > > > >> > > mutations
> > > > >>>> > > > >> > > > are effected. I opened HBASE-14460 to start in
> on a
> > > fix
> > > > >>>> up.
> > > > >>>> > Lets
> > > > >>>> > > > >> see if
> > > > >>>> > > > >> > > we
> > > > >>>> > > > >> > > > can somehow scope mvcc to row or at least shard
> > mvcc
> > > so
> > > > >>>> not
> > > > >>>> > all
> > > > >>>> > > > >> Region
> > > > >>>> > > > >> > > ops
> > > > >>>> > > > >> > > > are paused.
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > > > St.Ack
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > > > On Mon, Sep 14, 2015 at 4:15 AM, 鈴木俊裕 <
> > > > >>>> brfrn...@gmail.com>
> > > > >>>> > > wrote:
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > > > accompanying
> > > > >>>> > helpful
> > > > >>>> > > > >> > > diagram).
> > > > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to
> help
> > > > with
> > > > >>>> the
> > > > >>>> > > > >> > > illustration.
> > > > >>>> > > > >> > > > It
> > > > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a
> row
> > > > >>>> only...
> > > > >>>> > > Writes
> > > > >>>> > > > >> > > against
> > > > >>>> > > > >> > > > > > other rows should not hold up my read of my
> > row.
> > > > Tag
> > > > >>>> an
> > > > >>>> > mvcc
> > > > >>>> > > > >> with a
> > > > >>>> > > > >> > > > 'row'
> > > > >>>> > > > >> > > > > > scope so we can see which on-going writes
> > pertain
> > > > to
> > > > >>>> > current
> > > > >>>> > > > >> > > operation?
> > > > >>>> > > > >> > > > > Thank you St.Ack! I think this approach would
> > work.
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > > You need to read back the increment and have
> it
> > > be
> > > > >>>> > 'correct'
> > > > >>>> > > > at
> > > > >>>> > > > >> > > > increment
> > > > >>>> > > > >> > > > > > time?
> > > > >>>> > > > >> > > > > Yes, we need it.
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > I would like to help if there is anything I can
> > do.
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > Thanks,
> > > > >>>> > > > >> > > > > Toshihiro Suzuki
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > 2015-09-13 14:11 GMT+09:00 Stack <
> > st...@duboce.net
> > > >:
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > > > > Thank you for the below reasoning (with
> > > > accompanying
> > > > >>>> > helpful
> > > > >>>> > > > >> > > diagram).
> > > > >>>> > > > >> > > > > > Makes sense. Let me hack up a test case to
> help
> > > > with
> > > > >>>> the
> > > > >>>> > > > >> > > illustration.
> > > > >>>> > > > >> > > > It
> > > > >>>> > > > >> > > > > > is as though the mvcc should be scoped to a
> row
> > > > >>>> only...
> > > > >>>> > > Writes
> > > > >>>> > > > >> > > against
> > > > >>>> > > > >> > > > > > other rows should not hold up my read of my
> > row.
> > > > Tag
> > > > >>>> an
> > > > >>>> > mvcc
> > > > >>>> > > > >> with a
> > > > >>>> > > > >> > > > 'row'
> > > > >>>> > > > >> > > > > > scope so we can see which on-going writes
> > pertain
> > > > to
> > > > >>>> > current
> > > > >>>> > > > >> > > operation?
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > You need to read back the increment and have
> it
> > > be
> > > > >>>> > 'correct'
> > > > >>>> > > > at
> > > > >>>> > > > >> > > > increment
> > > > >>>> > > > >> > > > > > time?
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > (This is a good one)
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > Thank you Toshihiro Suzuki
> > > > >>>> > > > >> > > > > > St.Ack
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > On Sat, Sep 12, 2015 at 8:09 AM, 鈴木俊裕 <
> > > > >>>> brfrn...@gmail.com
> > > > >>>> > >
> > > > >>>> > > > >> wrote:
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > > > > St.Ack,
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Thank you for your response.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Why I make out that "A region lock (not a
> row
> > > > lock)
> > > > >>>> > seems
> > > > >>>> > > to
> > > > >>>> > > > >> > occur
> > > > >>>> > > > >> > > in
> > > > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete()" is
> as
> > > > >>>> follows:
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > A increment operation has 3 procedures for
> > > MVCC.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > 1.
> > mvcc.waitForPreviousTransactionsComplete();
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6712
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > 2. w =
> > > > mvcc.beginMemstoreInsertWithSeqNum(mvccNum);
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6721
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > 3. mvcc.completeMemstoreInsertWithSeqNum(w,
> > > > >>>> walKey);
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6893
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > I think that
> MultiVersionConsistencyControl's
> > > > >>>> writeQueue
> > > > >>>> > > can
> > > > >>>> > > > >> > cause
> > > > >>>> > > > >> > > a
> > > > >>>> > > > >> > > > > > region
> > > > >>>> > > > >> > > > > > > lock.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L42-L43
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Step 2 adds to a WriteEntry to writeQueue.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L102-L108
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Step 3 removes the WriteEntry from
> > writeQueue.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > completeMemstoreInsertWithSeqNum(w, walKey)
> > ->
> > > > >>>> > > > >> > > > > > > waitForPreviousTransactionsComplete(e) ->
> > > > >>>> > > advanceMemstore(w)
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L127
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L235
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L160
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Step 1 adds a WriteEntry w in
> > > > >>>> beginMemstoreInsert() to
> > > > >>>> > > > >> writeQueue
> > > > >>>> > > > >> > > and
> > > > >>>> > > > >> > > > > > waits
> > > > >>>> > > > >> > > > > > > until writeQueue is empty or
> > > > writeQueue.getFirst()
> > > > >>>> == w.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L201-L204
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L206-L241
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > I think when a handler thread is processing
> > > > between
> > > > >>>> > step 2
> > > > >>>> > > > and
> > > > >>>> > > > >> > step
> > > > >>>> > > > >> > > > 3,
> > > > >>>> > > > >> > > > > > the
> > > > >>>> > > > >> > > > > > > other handler threads can wait at step 1
> > until
> > > > the
> > > > >>>> > thread
> > > > >>>> > > > >> > completes
> > > > >>>> > > > >> > > > > step
> > > > >>>> > > > >> > > > > > 3
> > > > >>>> > > > >> > > > > > > This is depicted as follows:
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/region_lock.png
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Actually, in the thread dump of our region
> > > > server,
> > > > >>>> many
> > > > >>>> > > > >> handler
> > > > >>>> > > > >> > > > threads
> > > > >>>> > > > >> > > > > > > (RW.default.writeRpcServer.handler) wait at
> > > Step
> > > > 1
> > > > >>>> > > > >> > > > > > > (waitForPreviousTransactionsComplete()).
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://gist.githubusercontent.com/brfrn169/cb4f2c157129330cd932/raw/86d6aae5667b0fe006b16fed80f1b0c4945c7fd0/thread_dump.txt
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Many handler threads wait at this:
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://github.com/cloudera/hbase/blob/cdh5.4.5-release/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MultiVersionConsistencyControl.java#L224
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> > > counter
> > > > >>>> > > > post-upgrade?
> > > > >>>> > > > >> > Is
> > > > >>>> > > > >> > > it
> > > > >>>> > > > >> > > > > > > > possible that all these threads are
> trying
> > to
> > > > >>>> get to
> > > > >>>> > the
> > > > >>>> > > > >> same
> > > > >>>> > > > >> > row
> > > > >>>> > > > >> > > > to
> > > > >>>> > > > >> > > > > > > update
> > > > >>>> > > > >> > > > > > > > it? Could the app behavior have changed?
> > Or
> > > > are
> > > > >>>> you
> > > > >>>> > > > >> thinking
> > > > >>>> > > > >> > > > > increment
> > > > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > > > >>>> > > > >> > > > > > > We have just upgraded HBase, not changed
> the
> > > app
> > > > >>>> > behavior.
> > > > >>>> > > > We
> > > > >>>> > > > >> are
> > > > >>>> > > > >> > > > > > thinking
> > > > >>>> > > > >> > > > > > > increment itself has slowed significantly.
> > > > >>>> > > > >> > > > > > > Before upgrading HBase, it was good
> > throughput
> > > > and
> > > > >>>> > > latency.
> > > > >>>> > > > >> > > > > > > Currently, to cope with this problem, we
> > split
> > > > the
> > > > >>>> > regions
> > > > >>>> > > > >> > finely.
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Thanks,
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > Toshihiro Suzuki
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > 2015-09-09 15:29 GMT+09:00 Stack <
> > > > st...@duboce.net
> > > > >>>> >:
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > > > > On Tue, Sep 8, 2015 at 10:22 PM, 鈴木俊裕 <
> > > > >>>> > > brfrn...@gmail.com
> > > > >>>> > > > >
> > > > >>>> > > > >> > > wrote:
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > > Ted,
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > Thank you for your response.
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > I uploaded the complete stack trace to
> > > Gist.
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > https://gist.github.com/brfrn169/cb4f2c157129330cd932
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > I think that increment operation works
> as
> > > > >>>> follows:
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > 1. get row lock
> > > > >>>> > > > >> > > > > > > > > 2.
> > > mvcc.waitForPreviousTransactionsComplete()
> > > > >>>> //
> > > > >>>> > wait
> > > > >>>> > > > for
> > > > >>>> > > > >> all
> > > > >>>> > > > >> > > > prior
> > > > >>>> > > > >> > > > > > > MVCC
> > > > >>>> > > > >> > > > > > > > > transactions to finish
> > > > >>>> > > > >> > > > > > > > > 3. mvcc.beginMemstoreInsertWithSeqNum()
> > //
> > > > >>>> start a
> > > > >>>> > > > >> > transaction
> > > > >>>> > > > >> > > > > > > > > 4. get previous values
> > > > >>>> > > > >> > > > > > > > > 5. create KVs
> > > > >>>> > > > >> > > > > > > > > 6. write to Memstore
> > > > >>>> > > > >> > > > > > > > > 7. write to WAL
> > > > >>>> > > > >> > > > > > > > > 8. release row lock
> > > > >>>> > > > >> > > > > > > > > 9.
> > mvcc.completeMemstoreInsertWithSeqNum()
> > > //
> > > > >>>> > complete
> > > > >>>> > > > the
> > > > >>>> > > > >> > > > > > transaction
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > A instance of
> > > MultiVersionConsistencyControl
> > > > >>>> has a
> > > > >>>> > > > pending
> > > > >>>> > > > >> > > queue
> > > > >>>> > > > >> > > > of
> > > > >>>> > > > >> > > > > > > > writes
> > > > >>>> > > > >> > > > > > > > > named writeQueue.
> > > > >>>> > > > >> > > > > > > > > Step 2 puts a WriteEntry w to
> writeQueue
> > > and
> > > > >>>> waits
> > > > >>>> > > until
> > > > >>>> > > > >> > > > writeQueue
> > > > >>>> > > > >> > > > > > is
> > > > >>>> > > > >> > > > > > > > > empty or writeQueue.getFirst() == w.
> > > > >>>> > > > >> > > > > > > > > Step 3 puts a WriteEntry to writeQueue
> > and
> > > > >>>> step 9
> > > > >>>> > > > removes
> > > > >>>> > > > >> the
> > > > >>>> > > > >> > > > > > > WriteEntry
> > > > >>>> > > > >> > > > > > > > > from writeQueue.
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > I think that when a handler thread is
> > > > >>>> processing
> > > > >>>> > > between
> > > > >>>> > > > >> > step 2
> > > > >>>> > > > >> > > > and
> > > > >>>> > > > >> > > > > > > step
> > > > >>>> > > > >> > > > > > > > 9,
> > > > >>>> > > > >> > > > > > > > > the other handler threads can wait
> until
> > > the
> > > > >>>> thread
> > > > >>>> > > > >> completes
> > > > >>>> > > > >> > > > step
> > > > >>>> > > > >> > > > > 9.
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > That is right. We need to read, after all
> > > > >>>> outstanding
> > > > >>>> > > > >> updates
> > > > >>>> > > > >> > are
> > > > >>>> > > > >> > > > > > done...
> > > > >>>> > > > >> > > > > > > > because we need to read the latest update
> > > > before
> > > > >>>> we go
> > > > >>>> > > to
> > > > >>>> > > > >> > > > > > > modify/increment
> > > > >>>> > > > >> > > > > > > > it.
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > How do you make out this?
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > "A region lock (not a row lock) seems to
> > > occur
> > > > in
> > > > >>>> > > > >> > > > > > > > waitForPreviousTransactionsComplete()."
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > In 0.98.x we did this:
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > mvcc.completeMemstoreInsert(mvcc.beginMemstoreInsert());
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > ... and in 1.0 we do this:
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> mvcc.waitForPreviousTransactionsComplete()
> > > > which
> > > > >>>> is
> > > > >>>> > > > this....
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > +  public void
> > > > >>>> waitForPreviousTransactionsComplete() {
> > > > >>>> > > > >> > > > > > > > +    WriteEntry w =
> beginMemstoreInsert();
> > > > >>>> > > > >> > > > > > > > +
> > waitForPreviousTransactionsComplete(w);
> > > > >>>> > > > >> > > > > > > > +  }
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > The mvcc and region sequenceid were
> merged
> > in
> > > > >>>> 1.0 (
> > > > >>>> > > > >> > > > > > > >
> > > > https://issues.apache.org/jira/browse/HBASE-8763
> > > > >>>> ).
> > > > >>>> > > > Previous
> > > > >>>> > > > >> > mvcc
> > > > >>>> > > > >> > > > and
> > > > >>>> > > > >> > > > > > > > region
> > > > >>>> > > > >> > > > > > > > sequenceid would spin independent of each
> > > > other.
> > > > >>>> > Perhaps
> > > > >>>> > > > >> this
> > > > >>>> > > > >> > > > > > responsible
> > > > >>>> > > > >> > > > > > > > for some slow down.
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > That said, looking in your thread dump,
> we
> > > seem
> > > > >>>> to be
> > > > >>>> > > down
> > > > >>>> > > > >> in
> > > > >>>> > > > >> > the
> > > > >>>> > > > >> > > > > Get.
> > > > >>>> > > > >> > > > > > If
> > > > >>>> > > > >> > > > > > > > you do a bunch of thread dumps in a row,
> > > where
> > > > >>>> is the
> > > > >>>> > > > >> > > lock-holding
> > > > >>>> > > > >> > > > > > > thread?
> > > > >>>> > > > >> > > > > > > > In Get or writing Increment... or waiting
> > on
> > > > >>>> sequence
> > > > >>>> > > id?
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > Is it possible you are contending on a
> > > counter
> > > > >>>> > > > post-upgrade?
> > > > >>>> > > > >> > Is
> > > > >>>> > > > >> > > it
> > > > >>>> > > > >> > > > > > > > possible that all these threads are
> trying
> > to
> > > > >>>> get to
> > > > >>>> > the
> > > > >>>> > > > >> same
> > > > >>>> > > > >> > row
> > > > >>>> > > > >> > > > to
> > > > >>>> > > > >> > > > > > > update
> > > > >>>> > > > >> > > > > > > > it? Could the app behavior have changed?
> > Or
> > > > are
> > > > >>>> you
> > > > >>>> > > > >> thinking
> > > > >>>> > > > >> > > > > increment
> > > > >>>> > > > >> > > > > > > > itself has slowed significantly?
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > St.Ack
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > > > > Thanks,
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > Toshihiro Suzuki
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > 2015-09-09 0:05 GMT+09:00 Ted Yu <
> > > > >>>> > yuzhih...@gmail.com
> > > > >>>> > > >:
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > In HRegion#increment(), we lock the
> row
> > > > (not
> > > > >>>> > > region):
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >     try {
> > > > >>>> > > > >> > > > > > > > > >       rowLock = getRowLock(row);
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > Can you pastebin the complete stack
> > > trace ?
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > Thanks
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > On Tue, Sep 8, 2015 at 2:01 AM, 鈴木俊裕
> <
> > > > >>>> > > > >> brfrn...@gmail.com>
> > > > >>>> > > > >> > > > wrote:
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Hi,
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > We upgraded our cluster from
> > > > >>>> > CDH5.3.1(HBase0.98.6)
> > > > >>>> > > > to
> > > > >>>> > > > >> > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > >>>> > > > >> > > > > > > > > > > and we experience slowdown in
> > increment
> > > > >>>> > operation.
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Here's an extract from thread dump
> of
> > > the
> > > > >>>> > > > >> RegionServer of
> > > > >>>> > > > >> > > our
> > > > >>>> > > > >> > > > > > > > cluster:
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Thread 68
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > (RW.default.writeRpcServer.handler=15,queue=5,port=60020):
> > > > >>>> > > > >> > > > > > > > > > >   State: BLOCKED
> > > > >>>> > > > >> > > > > > > > > > >   Blocked count: 21689888
> > > > >>>> > > > >> > > > > > > > > > >   Waited count: 39828360
> > > > >>>> > > > >> > > > > > > > > > >   Blocked on
> > > > java.util.LinkedList@3474e4b2
> > > > >>>> > > > >> > > > > > > > > > >   Blocked by 63
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > (RW.default.writeRpcServer.handler=10,queue=0,port=60020)
> > > > >>>> > > > >> > > > > > > > > > >   Stack:
> > > > >>>> > > > >> > > > > > > > > > >     java.lang.Object.wait(Native
> > > Method)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:224)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MultiVersionConsistencyControl.waitForPreviousTransactionsComplete(MultiVersionConsistencyControl.java:203)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> >
> > > > >>>>
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:6712)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.increment(RSRpcServices.java:501)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:570)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1901)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31451)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > >
> > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2035)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > >
> > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > >
> > > > >>>>
> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > > > >>>> > > > >> > > > > > > > > > >
> > >  java.lang.Thread.run(Thread.java:745)
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > There are many similar threads in
> the
> > > > >>>> thread
> > > > >>>> > dump.
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > I read the source code and I think
> > this
> > > > is
> > > > >>>> > caused
> > > > >>>> > > by
> > > > >>>> > > > >> > > changes
> > > > >>>> > > > >> > > > of
> > > > >>>> > > > >> > > > > > > > > > > MultiVersionConsistencyControl.
> > > > >>>> > > > >> > > > > > > > > > > A region lock (not a row lock)
> seems
> > to
> > > > >>>> occur in
> > > > >>>> > > > >> > > > > > > > > > >
> > waitForPreviousTransactionsComplete().
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Also we wrote performance test code
> > for
> > > > >>>> > increment
> > > > >>>> > > > >> > operation
> > > > >>>> > > > >> > > > > that
> > > > >>>> > > > >> > > > > > > > > included
> > > > >>>> > > > >> > > > > > > > > > > 100 threads and ran it in local
> mode.
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > The result is shown below:
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > CDH5.3.1(HBase0.98.6)
> > > > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 12757,
> Latency(ms):
> > > > >>>> > > > >> 7.975072509210629
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > CDH5.4.5(HBase1.0.0)
> > > > >>>> > > > >> > > > > > > > > > > Throughput(op/s): 2027,
> Latency(ms):
> > > > >>>> > > > 49.11840157868772
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Thanks,
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > > > Toshihiro Suzuki
> > > > >>>> > > > >> > > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > > >
> > > > >>>> > > > >> > > > > > > > >
> > > > >>>> > > > >> > > > > > > >
> > > > >>>> > > > >> > > > > > >
> > > > >>>> > > > >> > > > > >
> > > > >>>> > > > >> > > > >
> > > > >>>> > > > >> > > >
> > > > >>>> > > > >> > >
> > > > >>>> > > > >> >
> > > > >>>> > > > >>
> > > > >>>> > > > >
> > > > >>>> > > >
> > > > >>>> > >
> > > > >>>> >
> > > > >>>>
> > > > >>>
> > > >
> > >
> >
>

Reply via email to