[jira] [Updated] (HBASE-14460) [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed Increments, CheckAndPuts, batch operations

2015-12-22 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14460:
--
Attachment: 14460.v0.branch-1.0.patch

Patch for branch-1.0. This is what I'm going to fix first. This patch tries to 
minimize change. Master branch will be very different with a radical redo (it 
is warranted given the code duplication and duplication of record keeping; i.e. 
we  keep all Cells incremented twice... once as standalone list and then again 
inside in FSWALEntry.  Here is the commit log message:

{code}
Patch for branch-1.0 first. Will address later branches with a different
approach (a more radical fixup). Here we are trying to be safe making  minimal
change.

This patch adds a fast increment. To enable it you set the below configuration
to true in your hbase-site.xml configuration:

  hbase.increment.fast.but.narrow.consistency

This sets region to take the fast increment path. Constraint is that caller
can only access the  Cell via Increment; intermixing Increment with other
Mutations will give indeterminate results. Get will work or an Increment of zero
will return current value.

So, to add the above, we effectively copy/paste current Increment after doing
a bunch of work to try and move common code out into methods that can be
shared. Current increment becomes a switch and dependent on config we take the
slow but consistent or the fast but narrowly consistent code path. Increment
code path has too much state that it needs to keep up so hard to shrink it down 
more
than what we have here without radical refactor (TODO in master patch; the
refactor is needed because even cursory exploration has us DUPLICATING lists
of Cells ... some of which is addressed on fast path here but more to do; fast
path also simplifies the write to hbase so am able to drop some of the state
keeping).

Adds a carryForward set of methods for Tags handling which allows us clean up
some duplicated code.

So, difference between fastAndNarrowConsistencyIncrement and 
slowButConsistentIncrement is
that the former holds the row lock until the sync completes; this allows us to 
reason that
there are no other writers afoot when we read the current increment value. This 
means
we do not wait on mvcc reads to catch up to writes before we proceed with the 
read, the
root of the slowdown seen in HBASE-14460. The fast-path also does not wait on 
mvcc to
complete before returning to the client and we reorder the write so that the 
update of
memstore happens AFTER sync returns; i.e. the write pipeline is less zigzagging 
now.

Added some simple concurrency testing and then a performance testing tool for
Increments.

Added test that Increment of zero amount returns the current Increment value.
{code}

> [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed 
> Increments, CheckAndPuts, batch operations
> ---
>
> Key: HBASE-14460
> URL: https://issues.apache.org/jira/browse/HBASE-14460
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Attachments: 0.94.test.patch, 0.98.test.patch, 
> 1.0.80.flamegraph-7932.svg, 14460.txt, 14460.v0.branch-1.0.patch, 
> 98.80.flamegraph-11428.svg, HBASE-14460-discussion.patch, client.test.patch, 
> flamegraph-13120.svg.master.singlecell.svg, flamegraph-26636.094.100.svg, 
> flamegraph-28066.098.singlecell.svg, flamegraph-28767.098.100.svg, 
> flamegraph-31647.master.100.svg, flamegraph-9466.094.singlecell.svg, 
> hack.flamegraph-16593.svg, hack.uncommitted.patch, m.test.patch, 
> region_lock.png, testincrement.094.patch, testincrement.098.patch, 
> testincrement.master.patch
>
>
> As reported by 鈴木俊裕 up on the mailing list -- see "Performance degradation 
> between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)" -- our unification of 
> sequenceid and MVCC slows Increments (and other ops) as the mvcc needs to 
> 'catch up' to our current point before we can read the last Increment value 
> that we need to update.
> We can say that our Increment is just done wrong, we should just be writing 
> Increments and summing on read, but checkAndPut as well as batching 
> operations have the same issue. Fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14460) [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed Increments, CheckAndPuts, batch operations

2015-12-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14460:
--
Attachment: hack.uncommitted.patch
hack.flamegraph-16593.svg

Patch for discussion. Gets us back to 0.98 speeds; i.e. about 1/3rd slower than 
0.94.

Idea is to unshackle Increments and MVCC other than to keep MVCC abreast of 
sequenceId change. I can do this if I reorder Increment so the Increment get 
and write (as well as sync) are all under the row lock; this makes it so my 
read will get the latest always  since no concurrent writer on this row 
(because I have undone the mvcc connection, I need to read with isolation 
UNCOMMITTED). If I reorder Increment so its read, append, sync, then update 
memstore, I can undo the crazy +1B and the need of the post-modification of 
Cells in MemStore.

The gambit is a slower Increment because all happens under the row lock 
including the sync of the write which used to be done on the outside. This 
makes it so we don't need MVCC for correctness and so can by-pass the 
MVCC-is-a-region-wide-lock phenomenon.

See attached flamegraph. It looks like 0.98 now.

Some basic tests using the above attached IncrementTest (80 concurrent threads 
doing an increment over 50k rows) show us doing:

{code}
75th: 3.92218
95th: 5.64862779997
98th: 8.07254229984
99th: 23.11843173
{code}

The same test against 0.98 as quoted above shows:
{code}
75th: 4.400081
95th: 6.0390387
98th: 6.7202052
99th: 7.26432036001

Time: 191.393
{code}

Posting the patch for discussion.

Need to figure downsides. Will study the patch more. Our Increment in memstore 
should work as expected when Scanning since we are using the actual assigned 
sequenceid. On crash, edit could be in WAL and client may not know it made it 
but this has always been an issue.

> [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed 
> Increments, CheckAndPuts, batch operations
> ---
>
> Key: HBASE-14460
> URL: https://issues.apache.org/jira/browse/HBASE-14460
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Attachments: 0.94.test.patch, 0.98.test.patch, 
> 1.0.80.flamegraph-7932.svg, 14460.txt, 98.80.flamegraph-11428.svg, 
> HBASE-14460-discussion.patch, client.test.patch, 
> flamegraph-13120.svg.master.singlecell.svg, flamegraph-26636.094.100.svg, 
> flamegraph-28066.098.singlecell.svg, flamegraph-28767.098.100.svg, 
> flamegraph-31647.master.100.svg, flamegraph-9466.094.singlecell.svg, 
> hack.flamegraph-16593.svg, hack.uncommitted.patch, m.test.patch, 
> region_lock.png, testincrement.094.patch, testincrement.098.patch, 
> testincrement.master.patch
>
>
> As reported by 鈴木俊裕 up on the mailing list -- see "Performance degradation 
> between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)" -- our unification of 
> sequenceid and MVCC slows Increments (and other ops) as the mvcc needs to 
> 'catch up' to our current point before we can read the last Increment value 
> that we need to update.
> We can say that our Increment is just done wrong, we should just be writing 
> Increments and summing on read, but checkAndPut as well as batching 
> operations have the same issue. Fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14460) [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed Increments, CheckAndPuts, batch operations

2015-12-15 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14460:
--
Attachment: client.test.patch
98.80.flamegraph-11428.svg
1.0.80.flamegraph-7932.svg

Attempts at reproducing the slowdown in the small have failed to pay off. I see 
the roughly 2x difference but not the 7x claimed by the original poster.

With some help from Preston Koprivica, you need to have some friction in place 
to see the issue; the friction gets amplified by mvcc wait.

Here is a test that clearly shows the problem. 0.98 is about 33% slower than 
0.94 (0.98 added in mvcc) and then 1.0+ is about 10x the latency WHEN you have 
80 clients running external to the regionserver process banging on it.

The flame graphs show us spending loads of time in mvcc waiting. The stack 
trace is the SAME as for the tests in the small but we just seem to be waiting 
overall longer. There is an amplification going on.

Looking at options:

[~jingcheng...@intel.com]'s suggestion is a nice one. Will narrow what we have 
to wait on. I tried disabling completely our wait-on-mvcc before we read at all 
and this helps; we are only 3x slower than 0.98 (and 4x slower than 0.94).

Need some other bit of trickery to take us closer to what was there before.

> [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed 
> Increments, CheckAndPuts, batch operations
> ---
>
> Key: HBASE-14460
> URL: https://issues.apache.org/jira/browse/HBASE-14460
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Attachments: 0.94.test.patch, 0.98.test.patch, 
> 1.0.80.flamegraph-7932.svg, 14460.txt, 98.80.flamegraph-11428.svg, 
> HBASE-14460-discussion.patch, client.test.patch, 
> flamegraph-13120.svg.master.singlecell.svg, flamegraph-26636.094.100.svg, 
> flamegraph-28066.098.singlecell.svg, flamegraph-28767.098.100.svg, 
> flamegraph-31647.master.100.svg, flamegraph-9466.094.singlecell.svg, 
> m.test.patch, region_lock.png, testincrement.094.patch, 
> testincrement.098.patch, testincrement.master.patch
>
>
> As reported by 鈴木俊裕 up on the mailing list -- see "Performance degradation 
> between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)" -- our unification of 
> sequenceid and MVCC slows Increments (and other ops) as the mvcc needs to 
> 'catch up' to our current point before we can read the last Increment value 
> that we need to update.
> We can say that our Increment is just done wrong, we should just be writing 
> Increments and summing on read, but checkAndPut as well as batching 
> operations have the same issue. Fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14460) [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed Increments, CheckAndPuts, batch operations

2015-12-14 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HBASE-14460:
-
Attachment: HBASE-14460-discussion.patch

I am thinking about an alternative way to improve the implementation in 
increment, checkAndPut, etc.
In each operation, we can attach a write number per row, in the operation of 
increment, we can wait for the previous operations to finish only in this row 
in mvcc.await()?
I had drafted an ugly patch (only for master) to do this for discussion. And I 
ran the TestIncrement, the results are listed in the following.
{noformat}
1. testContendedSingleCellIncrementer:
  With the patch: 1st run is 228.185s. 2nd run is 232.453s. 3th run is 
235.457s. 4th run is 229.003s.
  Without the patch: 1st run is 230.299s. 2nd run is 234.997s. 3rd run is 
219.224s. 4th run is 225.731s..
2. testUnContendedSingleCellIncrementer:
  With the patch: 59.244s.
  Without the patch: 81.667s.
{noformat}

The patch is attached in this JIRA for discussion. Thanks!

> [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed 
> Increments, CheckAndPuts, batch operations
> ---
>
> Key: HBASE-14460
> URL: https://issues.apache.org/jira/browse/HBASE-14460
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Attachments: 0.94.test.patch, 0.98.test.patch, 14460.txt, 
> HBASE-14460-discussion.patch, flamegraph-13120.svg.master.singlecell.svg, 
> flamegraph-26636.094.100.svg, flamegraph-28066.098.singlecell.svg, 
> flamegraph-28767.098.100.svg, flamegraph-31647.master.100.svg, 
> flamegraph-9466.094.singlecell.svg, m.test.patch, region_lock.png, 
> testincrement.094.patch, testincrement.098.patch, testincrement.master.patch
>
>
> As reported by 鈴木俊裕 up on the mailing list -- see "Performance degradation 
> between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)" -- our unification of 
> sequenceid and MVCC slows Increments (and other ops) as the mvcc needs to 
> 'catch up' to our current point before we can read the last Increment value 
> that we need to update.
> We can say that our Increment is just done wrong, we should just be writing 
> Increments and summing on read, but checkAndPut as well as batching 
> operations have the same issue. Fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14460) [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed Increments, CheckAndPuts, batch operations

2015-12-13 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14460:
--
Attachment: 0.98.test.patch
m.test.patch
0.94.test.patch
flamegraph-26636.094.100.svg
flamegraph-28767.098.100.svg
flamegraph-31647.master.100.svg

If I run a test that has 100 threads each updating their own rows -- i.e. no 
contention on a row -- then I see master branch completing before 0.94 does; 
i.e. master is faster. This is in spite of the thread dump resembling that 
reported as problematic up top of this issue.

In 0.94, all are stuck waiting on the WAL syncer to come in:
{code}
"50" #74 daemon prio=5 os_prio=0 tid=0x7f7a78661000 nid=0x3364 waiting for 
monitor entry [0x7f7a30ecd000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1334)
- waiting to lock <0x0004cde22390> (a java.lang.Object)
at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1476)
at 
org.apache.hadoop.hbase.regionserver.HRegion.syncOrDefer(HRegion.java:6160)
at 
org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:5571)
at 
org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:5454)
at 
org.apache.hadoop.hbase.regionserver.TestIncrement$SingleCellIncrementer.run(TestIncrement.java:84)
{code}

In master they are stuck here:
{code}
"17" #55 daemon prio=5 os_prio=0 tid=0x7f0374c6d000 nid=0x3a0b in 
Object.wait() [0x7f030c346000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Object.wait(Native Method)
at 
org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.waitForRead(MultiVersionConcurrencyControl.java:218)
- locked <0x0004d2e26208> (a java.lang.Object)
at 
org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.completeAndWait(MultiVersionConcurrencyControl.java:149)
at 
org.apache.hadoop.hbase.regionserver.MultiVersionConcurrencyControl.await(MultiVersionConcurrencyControl.java:137)
at 
org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:7360)
at 
org.apache.hadoop.hbase.regionserver.HRegion.increment(HRegion.java:7315)
at 
org.apache.hadoop.hbase.regionserver.TestIncrement$SingleCellIncrementer.run(TestIncrement.java:86)
{code

The flame graphs show basically the same profile across all verisons (master 
spends a bit less time appending which I suppose is expected).

> [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed 
> Increments, CheckAndPuts, batch operations
> ---
>
> Key: HBASE-14460
> URL: https://issues.apache.org/jira/browse/HBASE-14460
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Attachments: 0.94.test.patch, 0.98.test.patch, 14460.txt, 
> flamegraph-13120.svg.master.singlecell.svg, flamegraph-26636.094.100.svg, 
> flamegraph-28066.098.singlecell.svg, flamegraph-28767.098.100.svg, 
> flamegraph-31647.master.100.svg, flamegraph-9466.094.singlecell.svg, 
> m.test.patch, region_lock.png, testincrement.094.patch, 
> testincrement.098.patch, testincrement.master.patch
>
>
> As reported by 鈴木俊裕 up on the mailing list -- see "Performance degradation 
> between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)" -- our unification of 
> sequenceid and MVCC slows Increments (and other ops) as the mvcc needs to 
> 'catch up' to our current point before we can read the last Increment value 
> that we need to update.
> We can say that our Increment is just done wrong, we should just be writing 
> Increments and summing on read, but checkAndPut as well as batching 
> operations have the same issue. Fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14460) [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed Increments, CheckAndPuts, batch operations

2015-12-12 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14460:
--
Attachment: testincrement.094.patch
testincrement.098.patch
testincrement.master.patch
flamegraph-9466.094.singlecell.svg
flamegraph-13120.svg.master.singlecell.svg
flamegraph-28066.098.singlecell.svg

There are two ways in which master is slower than 0.94 increments. There is the 
case where threads are contending to update a single Cell and then there is the 
case described at the head of this issue where the mvcc coordination is acting 
like a region-wide lock though all threads incrementing may not be contending 
on a Cell.

Here are some rough measurements of the first case. See attached test. It has 
100 threads doing 10k increments of a single Cell up against a Region Instance.

{code}
0.94 ~84 seconds
0.98 ~140 seconds
master ~180 seconds
{code}

0.98 is almost 2x slower than 0.94 (though the code path profile is pretty 
close if you look at the accompanying flame graphs) and master is slower again, 
more than 2x slower.

As is, reports from the field have it that even 0.98 increments are too slow as 
is (being 2x slower, if many, can back up all handlers so no other work can get 
in). Hence the above exercise. It seem that indeed even without mvcc 
unification, increments have gotten slower.

Let me go measure the case where mvcc is getting in the way next.

> [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed 
> Increments, CheckAndPuts, batch operations
> ---
>
> Key: HBASE-14460
> URL: https://issues.apache.org/jira/browse/HBASE-14460
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Attachments: 14460.txt, flamegraph-13120.svg.master.singlecell.svg, 
> flamegraph-28066.098.singlecell.svg, flamegraph-9466.094.singlecell.svg, 
> region_lock.png, testincrement.094.patch, testincrement.098.patch, 
> testincrement.master.patch
>
>
> As reported by 鈴木俊裕 up on the mailing list -- see "Performance degradation 
> between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)" -- our unification of 
> sequenceid and MVCC slows Increments (and other ops) as the mvcc needs to 
> 'catch up' to our current point before we can read the last Increment value 
> that we need to update.
> We can say that our Increment is just done wrong, we should just be writing 
> Increments and summing on read, but checkAndPut as well as batching 
> operations have the same issue. Fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14460) [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed Increments, CheckAndPuts, batch operations

2015-09-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14460:
--
Attachment: region_lock.png

Here is link to the mailing list where 鈴木俊裕 describes the issue:

http://mail-archives.apache.org/mod_mbox/hbase-dev/201509.mbox/%3ccangerjyo+k+cpskvoqxf7qvk9wzvsnm9jwdnd4q8d11y3mf...@mail.gmail.com%3E

I've attached here the nice diagram he made to illustrate the problem.


> [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed 
> Increments, CheckAndPuts, batch operations
> ---
>
> Key: HBASE-14460
> URL: https://issues.apache.org/jira/browse/HBASE-14460
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Attachments: 14460.txt, region_lock.png
>
>
> As reported by 鈴木俊裕 up on the mailing list -- see "Performance degradation 
> between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)" -- our unification of 
> sequenceid and MVCC slows Increments (and other ops) as the mvcc needs to 
> 'catch up' to our current point before we can read the last Increment value 
> that we need to update.
> We can say that our Increment is just done wrong, we should just be writing 
> Increments and summing on read, but checkAndPut as well as batching 
> operations have the same issue. Fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14460) [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed Increments, CheckAndPuts, batch operations

2015-09-21 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-14460:
--
Attachment: 14460.txt

Silly test to demonstrate the problem.

> [Perf Regression] Merge of MVCC and SequenceId (HBASE-HBASE-8763) slowed 
> Increments, CheckAndPuts, batch operations
> ---
>
> Key: HBASE-14460
> URL: https://issues.apache.org/jira/browse/HBASE-14460
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Attachments: 14460.txt
>
>
> As reported by 鈴木俊裕 up on the mailing list -- see "Performance degradation 
> between CDH5.3.1(HBase0.98.6) and CDH5.4.5(HBase1.0.0)" -- our unification of 
> sequenceid and MVCC slows Increments (and other ops) as the mvcc needs to 
> 'catch up' to our current point before we can read the last Increment value 
> that we need to update.
> We can say that our Increment is just done wrong, we should just be writing 
> Increments and summing on read, but checkAndPut as well as batching 
> operations have the same issue. Fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)