[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882220#comment-13882220 ] Hudson commented on HBASE-8755: --- FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #65 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/65/]) HBASE-10156 FSHLog Refactor (WAS - Fix up the HBASE-8755 slowdown when low contention) (stack: rev 1561450) * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FailedLogCloseException.java * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FailedSyncBeforeLogCloseException.java * /hbase/trunk/hbase-server/pom.xml * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogFactory.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogUtil.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/RingBufferTruck.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCoprocessorHost.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestParallelPut.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestDurability.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollingNoCluster.java * /hbase/trunk/pom.xml A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: Feng Honghua Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882215#comment-13882215 ] Hudson commented on HBASE-8755: --- SUCCESS: Integrated in HBase-TRUNK #4856 (See [https://builds.apache.org/job/HBase-TRUNK/4856/]) HBASE-10156 FSHLog Refactor (WAS - Fix up the HBASE-8755 slowdown when low contention) (stack: rev 1561450) * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FailedLogCloseException.java * /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FailedSyncBeforeLogCloseException.java * /hbase/trunk/hbase-server/pom.xml * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogFactory.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogUtil.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/RingBufferTruck.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCoprocessorHost.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestParallelPut.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestDurability.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollingNoCluster.java * /hbase/trunk/pom.xml A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: Feng Honghua Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867544#comment-13867544 ] ramkrishna.s.vasudevan commented on HBASE-8755: --- Nice one. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: Feng Honghua Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847690#comment-13847690 ] stack commented on HBASE-8755: -- I did a compare over three hours and throughput flips in favor of this patch the longer we run (15% more writes after three hours) Let me commit. This patch makes for better throughput when under high contention, cleans up the code, and lays groundwork for multiwal with its detaching syncing from handlers but I don't like the slowdown at low numbers. Let us fix that promptly. I created a subtask to address this and assigned myself. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847698#comment-13847698 ] Andrew Purtell commented on HBASE-8755: --- Yes [~stack], let's get this out into a near release so we can see how it holds up. If we see perf problems during RC evaluation, we can revert if necessary. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: Feng Honghua Priority: Critical Fix For: 0.99.0 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847958#comment-13847958 ] Hudson commented on HBASE-8755: --- SUCCESS: Integrated in HBase-TRUNK #4722 (See [https://builds.apache.org/job/HBase-TRUNK/4722/]) HBASE-8755 A new write thread model for HLog to improve the overall HBase write throughput (stack: rev 1550778) * /hbase/trunk/hbase-common/src/main/resources/hbase-default.xml * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestDurability.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: Feng Honghua Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848055#comment-13848055 ] Hudson commented on HBASE-8755: --- SUCCESS: Integrated in HBase-0.98 #11 (See [https://builds.apache.org/job/HBase-0.98/11/]) HBASE-8755 A new write thread model for HLog to improve the overall HBase write throughput (stack: rev 1550782) * /hbase/branches/0.98/hbase-common/src/main/resources/hbase-default.xml * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestDurability.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: Feng Honghua Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848092#comment-13848092 ] Hudson commented on HBASE-8755: --- SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #8 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/8/]) HBASE-8755 A new write thread model for HLog to improve the overall HBase write throughput (stack: rev 1550782) * /hbase/branches/0.98/hbase-common/src/main/resources/hbase-default.xml * /hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestDurability.java * /hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: Feng Honghua Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848207#comment-13848207 ] Hudson commented on HBASE-8755: --- SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #5 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/5/]) HBASE-8755 A new write thread model for HLog to improve the overall HBase write throughput (stack: rev 1550778) * /hbase/trunk/hbase-common/src/main/resources/hbase-default.xml * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestDurability.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: Feng Honghua Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846412#comment-13846412 ] stack commented on HBASE-8755: -- All the Async* are just hanging out waiting. Missed notification or no notification? A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846583#comment-13846583 ] stack commented on HBASE-8755: -- I notice a syncer exited (it is missing from the thread dump). 2013-12-11 22:40:28,887 INFO [regionserver60020-AsyncHLogSyncer3-1386830380159] wal.FSHLog: regionserver60020-AsyncHLogSyncer3-1386830380159 exiting On a new run, again a thread exits: 2013-12-12 10:34:34,620 DEBUG [regionserver60020.logRoller] regionserver.LogRoller: HLog roll requested 2013-12-12 10:34:35,526 DEBUG [regionserver60020.logRoller] wal.FSHLog: cleanupCurrentWriter waiting for transactions to get synced total 37240 synced till here 37210 2013-12-12 10:34:36,560 INFO [regionserver60020-AsyncHLogSyncer1-1386873205908] wal.FSHLog: regionserver60020-AsyncHLogSyncer1-1386873205908 exiting Let me try and figure the why. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846657#comment-13846657 ] stack commented on HBASE-8755: -- Catching all exceptions, I got a NPE. 2013-12-12 11:03:43,870 INFO [Thread-14] regionserver.DefaultStoreFlusher: Flushed, sequenceid=2680455, memsize=129.3 M, hasBloomFilter=true, into tmp file hdfs://c2020.halxg.cloudera.com:8020/hbase/data/default/usertable/8dcf9c17c090f476346c8a31e4c9eddb/.tmp/64aaf14b38224f4fbce0a999f92dd8f4 2013-12-12 11:03:43,879 INFO [regionserver60020-AsyncHLogSyncer2-1386874930310] wal.FSHLog: UNEXPECTED java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1205) at java.lang.Thread.run(Thread.java:744) Writer is null. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846984#comment-13846984 ] Hadoop QA commented on HBASE-8755: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618499/8755v8.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:red}-1 hadoop1.0{color}. The patch failed to compile against the hadoop 1.0 profile. Here is snippet of errors: {code}[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hbase-server: Compilation failure: Compilation failure: [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java:[436,42] unclosed string literal [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java:[436,63] ';' expected [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java:[437,17] illegal start of expression [ERROR] /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java:[437,23] ';' expected [ERROR] - [Help 1] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hbase-server: Compilation failure at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59) -- Caused by: org.apache.maven.plugin.CompilationFailureException: Compilation failure at org.apache.maven.plugin.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:729) at org.apache.maven.plugin.CompilerMojo.execute(CompilerMojo.java:128) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209) ... 19 more{code} Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8148//console This message is automatically generated. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846994#comment-13846994 ] stack commented on HBASE-8755: -- This is what I'll commit. I've been running it on small cluster this afternoon and after fixing hardware, it seems to run fine at about the same speed as what we have currently (ycsb read/write loading). A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846997#comment-13846997 ] stack commented on HBASE-8755: -- 32 threads on one node writing a cluster of 4 nodes (~8 threads per server which according to our tests to date shows this model running slower than what we have). It does 10% less throughput after ~25minutes. We need to get the other speedups in after this goes in. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847069#comment-13847069 ] Hadoop QA commented on HBASE-8755: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618509/8755v9.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8149//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8149//console This message is automatically generated. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847194#comment-13847194 ] Feng Honghua commented on HBASE-8755: - [~stack] thanks. seems no further blocking issue? A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847195#comment-13847195 ] Feng Honghua commented on HBASE-8755: - bq.It does 10% less throughput after ~25minutes. == it's normal when flush/compact occurs after ~25 minutes write, we saw such level downgrade when doing long time tests for both with/without patch. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846090#comment-13846090 ] stack commented on HBASE-8755: -- Pardon me. This is taking a while; hardware issues and now I trunk seems to have issue where it hangs syncing, pre-patch I believe... investigating. Here is what I see. Lots of threads BLOCKED here: {code} RpcServer.handler=0,port=60020 daemon prio=10 tid=0x012f1800 nid=0x3cb5 waiting for monitor entry [0x7fdb0eb55000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.append(FSHLog.java:1006) - waiting to lock 0x000456c00390 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.appendNoSync(FSHLog.java:1054) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2369) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2087) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2037) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2041) at org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4175) at org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3424) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3328) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:28460) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:744) {code} Then the fella w/ the lock is doing this: {code} regionserver60020.logRoller daemon prio=10 tid=0x01159800 nid=0x3ca7 in Object.wait() [0x7fdb0f964000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1307) - locked 0x000456bf37a8 (a java.util.concurrent.atomic.AtomicLong) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1299) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:1412) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.cleanupCurrentWriter(FSHLog.java:760) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:566) - locked 0x000456c00390 (a java.lang.Object) - locked 0x000456c00330 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:96) at java.lang.Thread.run(Thread.java:744) {code} Server is bound up. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846107#comment-13846107 ] stack commented on HBASE-8755: -- Hmm... Happens when this patch is in place. Stuck here: {code} regionserver60020.logRoller daemon prio=10 tid=0x7f6f08822800 nid=0x5b0a in Object.wait() [0x7f6eeccef000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:503) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1304) - locked 0x00045756db98 (a java.util.concurrent.atomic.AtomicLong) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1296) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:1409) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.cleanupCurrentWriter(FSHLog.java:759) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:565) - locked 0x00045756dc70 (a java.lang.Object) - locked 0x00045756dc10 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:96) at java.lang.Thread.run(Thread.java:744) {code} Which is here: 1294 // sync all known transactions 1295 private void syncer() throws IOException { 1296 syncer(this.unflushedEntries.get()); // sync all pending items 1297 } 1298 1299 // sync all transactions upto the specified txid 1300 private void syncer(long txid) throws IOException { 1301 synchronized (this.syncedTillHere) { 1302 while (this.syncedTillHere.get() txid) { 1303 try { 1304 this.syncedTillHere.wait(); 1305 1306 if (txid = this.failedTxid.get()) { 1307 assert asyncIOE != null : 1308 current txid is among(under) failed txids, but asyncIOE is null!; 1309 throw asyncIOE; 1310 } 1311 } catch (InterruptedException e) { 1312 LOG.debug(interrupted while waiting for notification from AsyncNotifier); 1313 } 1314 } 1315 } 1316 } All other threads are trying to do an appendnosync: {code} RpcServer.handler=0,port=60020 daemon prio=10 tid=0x7f6f08a26800 nid=0x5b1b waiting for monitor entry [0x7f6eebee1000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.append(FSHLog.java:1005) - waiting to lock 0x00045756dc70 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.appendNoSync(FSHLog.java:1053) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2369) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2087) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2037) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2041) at org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4175) at org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3424) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3328) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:28460) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:744) {code} ... but can't make progress blocked on updateLock. {code} 995 private long append(HRegionInfo info, TableName tableName, WALEdit edits, ListUUID clusterIds, 996 final long now, HTableDescriptor htd, boolean doSync, boolean isInMemstore, 997 AtomicLong sequenceId, long nonceGroup, long nonce) throws IOException { 998 if (edits.isEmpty()) return this.unflushedEntries.get(); 999 if (this.closed) { 1000 throw new IOException(Cannot append; log is closed); 1001 } 1002 TraceScope traceScope = Trace.startSpan(FSHlog.append); 1003 try { 1004 long txid = 0; 1005 synchronized (this.updateLock) { 1006 // get the sequence number from the passed Long. In normal flow, it is coming from the 1007 // region. 1008 long seqNum = sequenceId.incrementAndGet(); ... {code} The update lock is held when rolling log here: 562 synchronized (updateLock) { 563 // Clean up
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846110#comment-13846110 ] stack commented on HBASE-8755: -- Will provide more on this tomorrow. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846130#comment-13846130 ] Feng Honghua commented on HBASE-8755: - The stack-traces are ok: when log-roll occurs, it holds the updateLock to prevent any subsequent writer handler put edits to pendingWrites (that's why all writer handler threads pend on updateLock), and then it calls *sync* to wait for Async* threads to sync all edits currently in pendingWrites (that's why logroller pend on sync())... Why no progress, we need to see why Async* threads don't finish the sync of current pendingWrites, would you please provide the Async* threads' stack traces? Only AsyncWriter from Async* threads needs the pendingWritesLock to grab the edits from pendingWrites, and AsyncNotifier needs syncTillHere to update it and notifyAll, these two both are hold-able under current situation: write handler threads can't hold pendingWritesLock before holding updateLock(within append()), logroller doesn't hold pendingWritesLock at all, and logroller is waiting for syncTillHere's change... A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844108#comment-13844108 ] Hadoop QA commented on HBASE-8755: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12617842/HBASE-8755-trunk-v6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8118//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8118//console This message is automatically generated. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844827#comment-13844827 ] stack commented on HBASE-8755: -- All sounds good above [~fenghh] I'm running a few tests here on cluster to see it basically works. Any other comments by anyone else? Otherwise, was planning on committing. We can work on further speedup in new issues; e.g. see if we can do less threads as per [~himan...@cloudera.com] A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844838#comment-13844838 ] Ted Yu commented on HBASE-8755: --- +1 from me. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844839#comment-13844839 ] Jonathan Hsieh commented on HBASE-8755: --- I did most of a review of v4 last week -- here are a few nits: nit: (fix on commit) {code} + //up and holds the lock + // NOTE! can't hold 'upateLock' here since rollWriter will pend + // on 'sync()' with 'updateLock', but 'sync()' will wait for + // AsyncWriter/AsyncSyncer/AsyncNotifier series. without upateLock + // can leads to pendWrites more than pendingTxid, but not problem {code} spelling: upate - update This can go in a follow up issue -- and please add a description of the threads / queues / invariants and how a wal writes happens in the class javadoc. An updated version of the 1-6 list in the description would be great. Good stuff [~fenghh]! A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844859#comment-13844859 ] Himanshu Vashishtha commented on HBASE-8755: Thanks for the explanation [~fenghh]; and it pretty much answers all my questions. Also, looking more, getting rid of LogSyncer thread eases out the locking semantics of rolling. Yes, I reran the above experiments on a more standard environment (4 DNs with HLogPE running on a DN, and log level set to INFO instead of Debug), and got mixed results this time. Varied threads from 2 to 100 and didn't get a clear winner. Given the current state of this patch and the cleanup it does, I am +1 for committing this. Looking forward to it. Thanks. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845035#comment-13845035 ] Feng Honghua commented on HBASE-8755: - Thanks [~jmhsieh], I've made and attached a new patch based on your comment. Thanks everyone again for your valuable comment and feedback. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845038#comment-13845038 ] Feng Honghua commented on HBASE-8755: - [~v.himanshu]: Looking forward to your performance comparison test result (trunk, with-this-patch, with-syncer-only) using latest HLogPE with new 'appendWithoutSync + sync' logic. And I'll also try to do the same test for double comparison/confirm. Performance improvement is the first and foremost goal of this patch, code cleanup is a just by-the-way side-effect, so we want to see this patch accepted/checked-in because of the performance improvement it brings, but because of the code cleanup it does :-) A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845103#comment-13845103 ] Hadoop QA commented on HBASE-8755: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618171/HBASE-8755-trunk-v7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8132//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8132//console This message is automatically generated. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845106#comment-13845106 ] stack commented on HBASE-8755: -- I'm running some tests local just to make sure. Will report back... A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843224#comment-13843224 ] Feng Honghua commented on HBASE-8755: - [~stack] : thanks for the comment, below are the comments after corresponding change. a new patch based on [~v.himanshu]'s latest v5 patch is attached(thanks [~v.himanshu]) bq.Remove these asserts rather than comment them out given they depended on a facility this patch removes. == done bq.using a Random for choosing an arbitrary thread for a list of 4 is heavyweight == done bq.Please remove all mentions of AsyncFlush since it no longer exists == done bq.Is this comment right? // txid = failedTxid will fail by throwing asyncIOE; Should it be = failedTxid? == this comment is right: txid larger than failedTxid isn't sync-ed by the one that notifies failedTxid. but txid smaller than or equal to failedTxid is (not must be, but since we don't maintain a txid range to syncer mapping, so we fail all txid smaller than or equal to failedTxid, this aligns with HBase's write semantic of 'failed write may succeed in fact'. this is a point we can refine later on by adding txid range to sync operation mapping to precisely indicate failure) bq.This should be volatile since it is set by AsyncSync and then used by the main FSHLog thread (you have an assert to check it not null – maybe you ran into an issue here already?): + private IOException asyncIOE = null; == done bq.'bufferLock' if a very generic name. Could it be more descriptive? It is a lock held for a short while while AsyncWriter moves queued edits off the globally seen queue to a local queue just before we send the edits to the WAL. You add a method named getPendingWrites that requires this lock be held. Could we tie the method and the lock together better? Name it pendingWritesLock? (The name of the list to hold the pending writes is pendingWrites). == done bq. (because the HDFS write-method is pretty heavyweight as far as locking is concerned.) I think the heavyweight referred to in the above is hbase locking...please adjust the comment == done bq.Comments on what these threads do will help the next code reader == done bq.Your patch should remove the optional flush config from hbase-default.xml too since it no longer is relevant == done bq.A small nit is you might look at other threads in hbase and see how they are named...It might be good if these better align == done bq.Probably make the number of asyncsyncers a configuration == done bq.but we do not seem to be doing it on the other call to doWrite at around line #969 inside in append == doWrite is called inside append, and the bufferLock(now renamed to pendingWritesLock) is held there bq.This method(setPendingTxid) is only called at close time if I read the patch right == it's called inside append() once doWrite() is done to notify AsyncWriter there are new pendingWrites to write to HDFS, it's not called at close time. you can double check it:-) bq.Is this 'fatal'? Or is it an 'error' == done bq.and request a log roll yet we carry on to try and sync, an op that will likely fail? We are ok here? We updated the write txid but not the sync txid so that should be fine. == we can't retry to sync after a log roll since we can't sync to a new hlog while the writes were written to the old hlog. we failed all the transactions with txid = write txid, it's ok here. bq.Do we need this: if (!asyncSyncers[ i ].isSyncing()) DFSClient will allow us call sync concurrently. I think DFSClient allow us call sync concurrently...HDFS will handle(synchronize) concurrent sync? bq.Can these be static classes or do they need context form the hosting FSHLog? == they all need context from hosting FSHLog (such as writer/asyncWriter/asyncSyncer/asyncNotifier) bq.These method names should not talk about 'flush'. They should be named 'sync' instead. Same for the flushlock. == done bq.Why atomic boolean and not just a volatile here? private AtomicBoolean isSyncing = new AtomicBoolean(false); == done bq.The above is very important. All your threads do this? == yes bq.It talks about writeChunk being expensive but we are not doing anything to ameliorate dfsclient writes if I dig down into our log writer == done (remove it) A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch,
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843227#comment-13843227 ] Feng Honghua commented on HBASE-8755: - I'll read and update according to [~v.himanshu]'s comment tomorrow. thanks. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843231#comment-13843231 ] Jean-Marc Spaggiari commented on HBASE-8755: For the random, you can also use something like System.currentTimeMillis() % asyncSyncers.length; Not saying that yours is not correct ;) A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843925#comment-13843925 ] Feng Honghua commented on HBASE-8755: - [~v.himanshu]: bq.1) log rolling thread safety: Log rolling happens in parallel with flush/sync. Currently, in FSHLog, sync call grabs the updateLock to ensure it has a non-null writer (because of parallel log rolling). How does this patch address the non-null writer? Or is it not needed anymore? Also, if you go for the updatelock in sync, that might result in deadlock. == Good question:-). in rollWriter(), before switching this.writer to the newly created writer, *updateLock is held* and *cleanupCurrentWriter() is called*. *updateLock is held* guarantees no new edits enters pendingWrites, and *cleanupCurrentWriter() is called* guarantees all edits in current pendingWrites must be written to hdfs and sync-ed(inside this method 'sync()' is called to provide this guarantee). This means when switching this.writer to newly HLog writer, no new edits enter and all current edits have already been sync-ed to hdfs, all AsyncWriter/AsyncSyncer threads have nothing to do and are idle, so no log rolling thread safety issue here bq.2) Error handling: It is not very clear how is flush/sync failures are being handled?... Let's say there are two handlers waiting for sync, t1 on txid 8 and t2 on txid 10. And, t1 wakes up on notification. Would t1 also get this exception? Wouldn't it be wrong, because txid 8 may have succeeded? Please correct me if I missed anything. == your understanding here is correct, but the write semantic of HBase is 'successful write response means a successful write, but failed write response can mean either a successful write or a failed write', right? and I have already mentioned this in above comment. this behavior can be improved by adding a txid-range to txid mapping, this mapping can help exactly indicate a failed txid fail which pending txid-range bq.3) I think the current HLogPE doesn't do justice to the real use case...Almost all HLog calls are appendNoSync, followed by a sync call. In the current HLogPE, we are calling append calls, which also does the sync == you're right, but from an whole view of write process, these two have no big difference concerning performance: append() calls sync() inside, and in real case of HRegion.java, appendNoSync and sync is called, since sync now is a pending on notification, the difference of these two behavior is sooner/later to pend on notification, no impact on overall write performance(after put edits to pendingWrites, write handler thread can just wait for write/sync to hdfs to finish and can't help/influence write performance)...right? bq.4) Perf numbers are super impressive. It would have been wonderful to have such numbers for lesser number of handler threads also (e.g., 5-10 threads). IMHO, that represents most common case scenario, but I could be wrong. I know this has been beaten to death in the discussions above, but just echoing my thoughts here == I think maybe many guys hold a same point of view as yours here, but I also have my personal thought on this:-) Throughput is different from latency: latency represents how quickly a system perform user requests, the quicker the better; while throughput represents how many requests a system perform user requests(concurrently, or to be more accurate, within a given time frame), the more the better. these two are both indicate a system's capability for performing user requests, but in different angles. certainly for application with low-medium write stress less write threads within client to issue requests is OK, but for application with high write stress, users/clients may feel bad if system just can't serve/reach their real-world throughput, *no matter* how many client threads are configured/added. with the improvement of this patch, at least we *can* satisfy users' such high throughput requirement by adding client threads. And without improving individual write request's latency(as this patch does, it does nothing for individual write's latency), it's hard to improve throughput for *synchronous* client write thread(it can issue next request only after it's done with the current one), that's why this patch has less effect for single/less write threads. If HBase supports *asynchronous* client write thread, I think this patch can also provide big improvement even with less client write threads. bq.5) I should also mention that while working on a different use case, I was trying to bring a layer of indirection b/w regionserver handlers and sync operation (Sync is the most costly affair in all HLog story). ... === glad to see similar approach:-). wonder if have same improvement under heavy write stress(such as 50/100/200 client write threads). Looking forward to seeing your final test results:-)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843926#comment-13843926 ] Feng Honghua commented on HBASE-8755: - [~jmspaggi] bq.For the random, you can also use something like System.currentTimeMillis() % asyncSyncers.length == yeah. System.currentTimeMillis() is a good candidate. txid can do the same thing as well. thanks:-) A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841093#comment-13841093 ] Andrew Purtell commented on HBASE-8755: --- This work looks pretty far along. If we can get it in soon I would be willing to try putting this in to 0.98 so this major improvement can manifest in a release. Exciting results. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840001#comment-13840001 ] Himanshu Vashishtha commented on HBASE-8755: This is some awesome stuff happening here. I have few comments apart from what Stack already mentioned. 1) log rolling thread safety: Log rolling happens in parallel with flush/sync. Currently, in FSHLog, sync call grabs the updateLock to ensure it has a non-null writer (because of parallel log rolling). How does this patch address the non-null writer? Or is it not needed anymore? Also, if you go for the updatelock in sync, that might result in deadlock. 2) Error handling: It is not very clear how is flush/sync failures are being handled? For example, if a write fails for txid 10, it notifies AsyncSyncer that writer is done with txid 10. And, AsyncSyncer notifies the notifier thread, which finally notifies the blocked handler using notifyAll. The handler checks for the failedTxid here: {code} + if (txid = this.failedTxid.get()) { {code} Let's say there are two handlers waiting for sync, t1 on txid 8 and t2 on txid 10. And, t1 wakes up on notification. Would t1 also get this exception? Wouldn't it be wrong, because txid 8 may have succeeded? Please correct me if I missed anything. 3) I think the current HLogPE doesn't do justice to the real use case. Almost *all* HLog calls are appendNoSync, followed by a sync call. In the current HLogPE, we are calling append calls, which also does the sync. When I changed it to represent the above common case, the performance numbers of current FSHLog using HLogPE improves quite a bit. I still need to figure out the reason, but the HLogPE change affects the perf numbers considerably IMO. For example, on a 5 node cluster, I see this difference on trunk: Earlier: {code} Summary: threads=3, iterations=10 took 218.590s 1372.432ops/s {code} Post HlogPE change: {code} Summary: threads=3, iterations=10 took 172.648s 1737.640ops/s {code} I think it would be great if we can test this with the correct HLogPE. What do you think Fenghua? 4) Perf numbers are super impressive. It would have been wonderful to have such numbers for lesser number of handler threads also (e.g., 5-10 threads). IMHO, that represents most common case scenario, but I could be wrong. I know this has been beaten to death in the discussions above, but just echoing my thoughts here. 5) I should also mention that while working on a different use case, I was trying to bring a layer of indirection b/w regionserver handlers and sync operation (Sync is the most costly affair in all HLog story). What resulted is a separate set of syncer threads, which does the work of flushing the edits and syncing to HDFS. This is what it looks like: a) The handlers append their entries to the FSHLog buffer as they do currently. b) They invoke sync API. There, they wait on the Syncer threads to do the work for them and notify. c) It results in batched sync effort but without much extra locking/threads. Basically, it is similar to what you did here but minus Writer Notifier threads and minus the bufferlock. I mentioned it here because with that approach, I see some perf improvement even with lesser number of handler threads. And, it also keeps the current deferredLogFlush behavior. This work is still in progress and is still a prototype. It would be great to know your take on it. Here are some numbers on a 5 node cluster; hdfs 2.1.0; hbase is trunk; client on a different node. I haven't tested it with larger number of threads but would be good to compare I think (its 2am here... ). It uses 3 syncers at the moment (and varying it would be a good thing to experiment too). Also, without patch in the below table means trunk + HLogPE patch. ||Threads||w/o patch time||w/o patch ops||w/ patch time||w/ patch ops|| |3|172.648s|1737.640ops/s|170.332s|1761.266ops/s| |3|170.977s|1754.622ops/s|174.568s|1718.528ops/s| |5|213.738s|2339.313ops/s|191.119s|2616.171ops/s| |5|211.072s|2368.860ops/s|189.671s|2636.144ops/s| |10|254.641s|3926.419ops/s|216.494s|4619.319ops/s| |10|251.503s|3978.564ops/s|215.333s|4643.266ops/s| |10|251.692s|3970.579ops/s|217.151s|4605.854ops/s| |20|648.943s|6163.870ops/s|646.279s|6189.277ops/s| |20|658.654s|6072.991ops/s|656.277s|6094.987ops/s| |25|282.654s|8861.991ops/s|249.277s|10033.987ops/s| A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755trunkV2.txt,
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840074#comment-13840074 ] stack commented on HBASE-8755: -- Nice review [~himan...@cloudera.com] On 3., above, yes, that is true. HLogPE does not seem to be representative as you suggest. But does your change below {code} -hlog.append(hri, hri.getTable(), walEdit, now, htd, region.getSequenceId()); +// this is how almost all users of HLog use it (all but compaction calls). +long txid = hlog.appendNoSync(hri, hri.getTable(), walEdit, clusters, now, htd, + region.getSequenceId(), true, nonce, nonce); +hlog.sync(txid); + {code} ... bring it closer to a 'real' use case? I see over in HRegion that we do a bunch of appendNoSync in minibatch or even in put before we call sync. Should we append more than just one set of edits before we call the sync? I suppose on a regionserver with a load of regions loaded up on it, all these syncs can come crashing in on top of each other on to the underlying WAL in an arbitary manner -- something Feng Honghua's patch mitigates some by making it so syncs are done when FSHLog thinks it appropriate rather than when some arbitrary HRegion call thinks it right ... and this is probably part of the reason for the perf improvement. Could we better regulate the sync calls so they are even less arbitrary? Even them out? It could make for better performance if there was a mechanism against syncs clumping together. Looking at your patch, the syncer is very much like Feng Honghua's -- it is interesting that you two independently came up w/ similar multithreaded syncing mechanism. That would seem to 'prove' this is a good approach. Feng's patch is much further along with a bunch of cleanup of FSHLog. Will wait on his comments on what he thinks of doing without AsyncWriter and AsyncNotifier. Looks like your patch is far enough long for us to do tests comparing the approaches? A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) --
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840663#comment-13840663 ] Jonathan Hsieh commented on HBASE-8755: --- I posted v4 for trunk on reviewboard and will reviewing there. https://reviews.apache.org/r/16052/ A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840710#comment-13840710 ] stack commented on HBASE-8755: -- Let me look into changing HLogPE after Himanshu's suggestion above. I'll add an option to batch up edits some. My sense is that it will make the difference between current code and Honghua's patch even larger when the count of threads is low but that it will not change the numbers when thread count is high. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840733#comment-13840733 ] Himanshu Vashishtha commented on HBASE-8755: I agree on HLogPE comment. Yes, grouping appendNoSync before calling sync is the right way to go. The existing one is way-off the real use. I figured that the w/o patch col in the above table contains Feng's patch too. I am re-running the experiments at the moment with three versions: Trunk, Trunk + Feng's patch, Trunk + Syncer's approach. And, all versions have that HLogPe fix on it; will report back the numbers once done. Going by what we are seeing here, batching sync calls is definitely the right way IMO. I agree that Feng Honghua's patch has been tested well enough, and I really like the radical cleanup it does. The code reads pretty clean now, though it involves far more number of threads and synchronization stuff (which makes it more interesting to debug too :)), I just wanted to ensure that it is safe. The reason I mentioned this different approach is it adds lesser number of threads (while keeping the current behaviour), and also shows improvement with smaller number of handlers, which to me looks like a nice win over current FSHLog. This is still in a prototype stage, and I absolutely don't want to block Feng's superb piece of work here. It would be good to know his thoughts on this. Thanks. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840812#comment-13840812 ] Feng Honghua commented on HBASE-8755: - Thanks very much for [~v.himanshu] and [~stack]'s review and further improvement suggestion. Currently I have some other stuff on hand to handle... I'll try best to come back soon and read through above comments. Thanks again. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755-syncer.patch, 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839662#comment-13839662 ] stack commented on HBASE-8755: -- Here is more review on the patch. Make the changes suggested below and I'll +1 it. (Discussion off-line w/ Feng on this issue helped me better understand this patch and put to rest any notion that there is an easier 'fix' than the one proposed here. That said. There is much room for improvement but this can be done in a follow-on) Remove these asserts rather than comment them out given they depended on a facility this patch removes. Leaving them in will only make the next reader of the code -- very likely lacking the context you have -- feel uneasy thinking someone removed asserts just to get tests to pass. 8 -assertTrue(Should have an outstanding WAL edit, ((FSHLog) log).hasDeferredEntries()); 9 +//assertTrue(Should have an outstanding WAL edit, ((FSHLog) log).hasDeferredEntries()); On the below... +import java.util.Random; ... using a Random for choosing an arbitrary thread for a list of 4 is heavyweight. Can you not take last digit of timestamp or nano timestamp or some attribute of the edit instead? Something more lightweight? Please remove all mentions of AsyncFlush since it no longer exists: // all writes pending on AsyncWrite/AsyncFlush thread with Leaving it in will confuse readers when they can't find any such thread class. Is this comment right? // txid = failedTxid will fail by throwing asyncIOE Should it be = failedTxid? This should be volatile since it is set by AsyncSync and then used by the main FSHLog thread (you have an assert to check it not null -- maybe you ran into an issue here already?): + private IOException asyncIOE = null; bq. + private final Object bufferLock = new Object(); 'bufferLock' if a very generic name. Could it be more descriptive? It is a lock held for a short while while AsyncWriter moves queued edits off the globally seen queue to a local queue just before we send the edits to the WAL. You add a method named getPendingWrites that requires this lock be held. Could we tie the method and the lock together better? Name it pendingWritesLock? (The name of the list to hold the pending writes is pendingWrites). bq. ...because the HDFS write-method is pretty heavyweight as far as locking is concerned. I think the heavyweight referred to in the above is hbase locking, not hdfs locking as the comment would imply. If you agree (you know this code better than I), please adjust the comment. Comments on what these threads do will help the next code reader. AsyncWriter does adding of edits to HDFS. AsyncSyncer needs a comment because it is oxymoronic (though it makes sense in this context). In particular, a comment would draw out why we need so many instances of a syncer thread because everyone's first thought here is going to be why do we need this? Ditto on the AsyncNotifier. In the reviews above, folks have asked why we need this thread at all and a code reader will likely think similar on a first pass. Bottom-line, your patch raised questions from reviewers; it would be cool if the questions were answered in code comments where possible so the questions do not come up again. 4 + private final AsyncWriter asyncWriter; 5 + private final AsyncSyncer[] asyncSyncers = new AsyncSyncer[5]; 6 + private final AsyncNotifier asyncNotifier; You remove the LogSyncer facility in this patch. That is good (need to note this in release notes). Your patch should remove the optional flush config from hbase-default.xml too since it no longer is relevant. 3 -this.optionalFlushInterval = 4 - conf.getLong(hbase.regionserver.optionallogflushinterval, 1 * 1000); I see it here... hbase-common/src/main/resources/hbase-default.xml: namehbase.regionserver.optionallogflushinterval/name A small nit is you might look at other threads in hbase and see how they are named... 3 +asyncWriter = new AsyncWriter(AsyncHLogWriter); Ditto here: + asyncSyncers[i] = new AsyncSyncer(AsyncHLogSyncer + i); Probably make the number of asyncsyncers a configuration (you don't have to put the option out in hbase-default.xml.. just make it so that if someone is reading the code and trips over this issue, they can change it by adding to hbase-site.xml w/o having to change code -- lets not reproduce the hard-coded '80' that is in the head of dfsclient we discussed yesterday -- smile). ... and here: asyncNotifier = new AsyncNotifier(AsyncHLogNotifier); Not important but check out how other threads are named in hbase. It might be good if these better align. Maybe make a method for shutting down all these thread or use the Threads#shutdown method in Threads.java? bq. LOG.error(Exception while waiting for AsyncNotifier threads to die, e); Do LOG.error(Exception
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838679#comment-13838679 ] stack commented on HBASE-8755: -- Sorry for the delay getting back to this [~fenghh] and thanks for the explanation. I am having trouble reviewing the patch because I am trying to understand what is going on here in FSHLog. It is hard to follow (not your patch necessarily but what is there currently) in spite of multiple reviews. I keep trying to grok what is going on because this is critical code. The numbers are hard to argue with and it does some nice cleanup of FSHLog which makes it easier to understand. We could commit this patch and then work on undoing the complexity that is rife here; your patch adds yet more because it adds interacting threads w/ new synchronizations, notifications, AtomicBoolean states, etc., which cost performance-wise but at least it is clearer what is going on and we have tools for comparing approaches now. We could work on simplication and removal of sync points in a follow-on (See below for a note on one approach). I now get why the need for multiple syncers. It is a little counter-intuitiive given we want to batch up edits more to get more performance on the one hand, but then on the other, we have to sync more often because sync'ing is outstanding for too much time, so much time it holds up handlers too long. + I am trying to understand why we keep aside the edits in a linked-list. This was there before your time. You just continue the practice. The original comment says We keep them cached here instead of writing them to HDFS piecemeal, because the HDFS write-method is pretty heavyweight as far as locking is concerned.Yet, when we eventually flush the edits, we don't do anything special; we just call write on the dfsoutputstream. We are not avoiding locking in hdfs. It must be the hbase flush/update locking that is being referred to here. + AsyncSyncer is a confounding name for a class -- but it makes sense in this context. The flush object in this thread is a syncer synchronization object not for memstore flushes... as I thought it was (there is use of flush in here when it probably should be sync to be consistent). Off-list, a few other lads are interested in reviewing this patch (it is a popular patch!)... our [~j...@cloudera.com] and possible [~himan...@cloudera.com] because they are getting stuck in this area. If they don't get to it soon, I'll commit unless objection. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838687#comment-13838687 ] Feng Honghua commented on HBASE-8755: - Thanks [~stack], and also the review from [~j...@cloudera.com] and [~himan...@cloudera.com], and other experienced guys is welcome, it's better to have this patch reviewed by as many guys as possible. Any question on this patch is welcome :-) A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832233#comment-13832233 ] Feng Honghua commented on HBASE-8755: - tried 4 asyncSyncer threads, below are the results. a bit worse than 5 threads but looks like acceptable? ||threads||ops-wo-patch||ops-w-patch||ops-diff|| |1|1716|1572|-8.4%| |3|3179|3189|+0.3%| |5|6091|5593|-8.1%| |10|8760|9450|+7.8%| |25|13019|18055|+38.7%| |50|14995|26597|+77.3%| |100|18824|51441|+173.2%| |200|18144|61531|+239.1%| additional explanation on correctness when introducing extra asyncSyncer threads: - when a txid(t0) is notified, all txid smaller than t0 must already be written to hdfs and by sync-ed: before t0 is notified, t0 must be sync-ed by an asyncSyncer thread; before t0 is sync-ed, t0 must be written to hdfs by asyncWriter thread; before t0 is written to hdfs, all txid smaller than t0 must be written to hdfs, so the sync of t0 can guarantee all txid smaller than t0 must be sync-ed (either before the sync of t0, or by the sync of t0) - when a txid(t0) can't find free(idle) asyncSyncer thread and added to a random one, it won't be sync-ed until its asyncSyncer thread is done with the txid at hand. but its entries already have been written to hdfs, and if any bigger txid than t0 (say t1) is successfully sync-ed by another parallel asyncSyncer thread, that sync can guarantee t0 also successfully sync-ed, hence when t1 is notified, t0 can also be correctly notified. any further comments? A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830678#comment-13830678 ] Feng Honghua commented on HBASE-8755: - bq.Do these no longer pass? = yes, under new thread model, no explicit method to do the sync and can't tell if there is outstanding deferred entries (the AsyncWriter/AsyncSyncer threads do write/sync in a best-effort way) bq.We have hard-coded 5 asyncSyncers? Why 5? = yes, I tried 2/3/5/10 and found 5 is the best number (2/3 have worse perf, 10 has equal perf but introduces too many extra threads) bq.If we fail to find a free syncer, i don't follow what is going on w/ choosing a random syncer and setting txid as in below = when fail to find a idle syncer(which is doing sync), choosing a random syncer and setting txid that way fall into the same way before introducing extra asyncSyncer threads: when asyncWriter pushes new entries to hdfs before asyncSyncer sync the previously pushed ones, asyncSyncer gets notified the newly pushed txid, but these txid will be synced by next time after asyncSyncer is done with the current ones, notice we use txidToFlush to record txid each sync is for, and it can't change during each sync, while writtenTxid can change during each sync) To summary: the sync operation is the most time-consuming phase, under old write model every write handler issues a separate sync directly for itself(if not return early by syncedTillHere). and under new write model, though separate threads significantly reduce the lock race, but if concurrent write threads is few, the benefit by reducing lock race(fewer write threads, fewer benefit) can't offset the inefficiency by using a single asyncSyncer threads(each time asyncSyncer thread can only sync for a portion of the writes, but the write handlers which already have their entries in buffer or pushed to hdfs also need to wait for its completeness, and can't proceed until its next sync phase is done) By introducing extra asyncSyncer threads, the correctness of this model is the same as before: still a single asyncWriter thread which push buffered entries to hdfs sequentially(txid increases sequentially), and when each asyncSyncer is done, it's guaranteed all txids smaller are pushed to hdfs and successfully sync-ed. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830681#comment-13830681 ] Feng Honghua commented on HBASE-8755: - bq.= yes, under new thread model, no explicit method to do the sync and can't tell if there is outstanding deferred entries (the AsyncWriter/AsyncSyncer threads do write/sync in a best-effort way) I meant calling 'sync' can guarantee no outstanding deferred entries, but no calling 'sync' after writew can't guarantee there must be some outstanding deferred entries since they can be sync-ed by asyncWriter/asyncSyncer threads. This is not the same behavior as under old write model. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830753#comment-13830753 ] Ted Yu commented on HBASE-8755: --- bq. 2/3 have worse perf, 10 has equal perf but introduces too many extra threads Do you remember how many extra threads were introduced ? A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830755#comment-13830755 ] stack commented on HBASE-8755: -- bq. Do you remember how many extra threads were introduced ? This question is answered above in my last review. Why do you ask? A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830356#comment-13830356 ] stack commented on HBASE-8755: -- I ran Feng's script except I left out trying 200 threads. ||Threads||time-wo-patch||ops-wo-patch||time-w-patch||ops-w-patch||ops-diff|| |1|973.673|1027.039|1119.825|892.997|-15%| |3|1303.891|2300.806|1400.848|2141.560|-7%| |5|855.775|5842.657|873.990|5720.889|-2%| |10|1093.330|9146.370|1090.158|9172.982|0%| |25|1632.263|15316.160|1215.196|20572.813|+25%| |50|2432.653|20553.691|1341.847|37262.070|+45%| |100|4058.650|24638.734|1725.729|57946.527|57%| This was stock hadoop 2.2 and tip of the hbase trunk. Those are pretty big improvements once we pass out ten concurrent threads. Some small down when one writer only is acceptable I'd say. Let me look again at the patch. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830360#comment-13830360 ] stack commented on HBASE-8755: -- We looked at using disruptor here? MPSC seems to be what we have going on here to which disruptor seems pretty well suited? Looking at the patch now... A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830516#comment-13830516 ] Feng Honghua commented on HBASE-8755: - Thanks [~stack] A small clarification of why we got so different ops-diff: I used *ops-diff = (new - old) / old* and you used *ops-diff = (new - old) / new*. for example, the 100 threads result, old one is 24638 and new one is 57946, the ops-diff is *135.2%* (new one is 2.35 times of old one), while you got *57%* :-) A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830563#comment-13830563 ] stack commented on HBASE-8755: -- Pardon my bad math [~fenghh] Below is correction (numbers are better now). ||Threads||time-wo-patch||ops-wo-patch||time-w-patch||ops-w-patch||ops-diff|| |1|973.673|1027.039|1119.825|892.997|-15%| |3|1303.891|2300.806|1400.848|2141.560|-7%| |5|855.775|5842.657|873.990|5720.889|-2%| |10|1093.330|9146.370|1090.158|9172.982|0%| |25|1632.263|15316.160|1215.196|20572.813|+34%| |50|2432.653|20553.691|1341.847|37262.070|+81%| |100|4058.650|24638.734|1725.729|57946.527|+135%| A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830604#comment-13830604 ] stack commented on HBASE-8755: -- Looking at patch... Do these no longer pass? -assertTrue(Should have an outstanding WAL edit, ((FSHLog) log).hasDeferredEntries()); +//assertTrue(Should have an outstanding WAL edit, ((FSHLog) log).hasDeferredEntries()); We have hard-coded 5 asyncSyncers? Why 5? We are starting 7 threads to run this new write model : the 5 syncers, a notifier, and a writer (I suppose we are also removing one, the old LogSyncer -- so 6 new threads). We set the below outside of a sync block: + this.asyncWriter.setPendingTxid(txid); Could we set the txid out of order here? If so, will that be a problem? Hmm... it seems not. If we get a lower txid, we just return and prevail w/ the highest we've seen. If we fail to find a free syncer, i don't follow what is going on w/ choosing a random syncer and setting txid as in below (Generally comments in patch are good -- this looks like a section that could do w/ some): + this.lastWrittenTxid = this.txidToWrite; + boolean hasIdleSyncer = false; + for (int i = 0; i asyncSyncers.length; ++i) { +if (!asyncSyncers[i].isSyncing()) { + hasIdleSyncer = true; + asyncSyncers[i].setWrittenTxid(this.lastWrittenTxid); + break; +} + } + if (!hasIdleSyncer) { +int idx = rd.nextInt(asyncSyncers.length); +asyncSyncers[idx].setWrittenTxid(this.lastWrittenTxid); + } Thanks. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829631#comment-13829631 ] Feng Honghua commented on HBASE-8755: - [~stack] : I fixed the downgrade for 5 threads and below are the test result, the write throughput capacity after tuning is *about 3.5 times* of the one before tuning while keeping the perf is almost equal when thread number is = 5. (for protobuf/test compatibility reason I based on hbase trunk 0.97 r1516083) {code} for i in 1 3 5 10 25 50 100 200; do for j in 1; do ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -path /user/h_fenghonghua/new-thread-v2/ -verify -threads ${i} -iterations 100 -keySize 50 -valueSize 100 log-patch${i}.${j}.txt; grep Summary: log-patch${i}.${j}.txt done; done {code} ||threads||time-without-patch||ops-without-patch||time-with-patch||ops-with-patch||ops-diff|| |1|582|1716|630|1586|-7.5%| |3|943|3179|951|3153|-0.8%| |5|820|6091|847|5899|-3.1%| |10|1141|8760|983|10166|+16%| |25|1920|13019|1286|19426|+49.2%| |50|3334|14995|1627|30715|+104.8%| |100|5312|18824|1925|51943|+185%| |200|11022|18144|3229|61922|+241.2%| [~jmspaggi] : I attached patches for latest trunk (HBASE-8755-trunk-v4.patch) and for the last 0.96 (HBASE-8755-0.96-v0.patch). Thanks very much if you can run similar comparison perf tests, and any issue please feel free to raise :-) A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829656#comment-13829656 ] stack commented on HBASE-8755: -- I'm starting tests running now. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13820068#comment-13820068 ] Jean-Marc Spaggiari commented on HBASE-8755: [~fenghh], any chance to rebase on the last 0.96 and optimize for less than 5 clients? If so I can give it a spin on a small cluster and run the PerfEval suite for 24 h to see the results? A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13820794#comment-13820794 ] Feng Honghua commented on HBASE-8755: - [~jmspaggi] sure, I'm tuning for less than 5 threads, and will provide patch based on the last 0.96 when the bottleneck is found and the tuning is done. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808163#comment-13808163 ] Sergey Shelukhin commented on HBASE-8755: - nice. Thanks! A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807214#comment-13807214 ] Sergey Shelukhin commented on HBASE-8755: - Question from above: {quote} ...is separate notifier thread necessary? It doesn't do any blocking operations other than interacting with flusher thread, or taking syncedTillHere lock, which looks like it should be uncontested most of the time. Couldn't flusher thread have the 4~ lines that set syncedTillHere? {quote} A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807618#comment-13807618 ] Hangjun Ye commented on HBASE-8755: --- [~sershe], that's a very good question. We found syncedTillHere.notifyAll() was a blocking operation and it might take a long time if many threads were waiting on it (which also surprised us). So we put this logic in a separate thread. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803932#comment-13803932 ] stack commented on HBASE-8755: -- Five datanodes with fusionio. hbase tip of 0.96 branch and hadoop-2.1.0-beta. HLogPE on master node. ||Threads||w/o patch time||w/o patch ops||w/ patch time||w/ patch ops|| |1|1048.033s|954.168ops/s|1100.423s|908.741ops/s| |1|1042.126s|959.577ops/s|1156.557s|864.635ops/s| |1|1052.601s|950.028ops/s|1143.271s|874.683ops/s| |5|904.176s|5529.896ops/s|1916.229s|2609.292ops/s| |5|910.469s|5491.675ops/s|1911.841s|2615.280ops/s| |5|925.778s|5400.863ops/s|1970.565s|2537.344ops/s| |50|2699.752s|18520.221ops/s|1889.877s|26456.748ops/s| |50|2689.678s|18589.586ops/s|1922.716s|26004.881ops/s| |50|2711.144s|18442.398ops/s|1893.439s|26406.977ops/s| |75|4945.563s|15165.108ops/s|1997.553s|37545.938ops/s| |75|4852.779s|15455.063ops/s|1992.425s|37642.570ops/s| |75|4921.685s|15238.684ops/s|-|-| |100|6224.527s|16065.479ops/s|2086.691s|47922.766ops/s| |100|6195.727s|16140.156ops/s|2091.869s|47804.145ops/s| Diffs are small when 1 thread only. Its bad at 5 threads but thereafter the patch starts to shine. If we could make the 5 threads better, we could commit this patch. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804957#comment-13804957 ] Feng Honghua commented on HBASE-8755: - [~stack] thanks a lot for the detailed perf test. agree with you. we'll try to make the 5 threads better A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792389#comment-13792389 ] Feng Honghua commented on HBASE-8755: - [~stack] thanks a lot :-) btw: the HLogPE test result withoutPatch/withPatch against 0.94.3(our internal branch) done by [~liushaohui] matches the one against hdfs sequence file, described in the 'Description' of this JIRA as below (17K vs. 68K): bq. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792422#comment-13792422 ] stack commented on HBASE-8755: -- Does that mean that we can just use HLogPE going forward evaluating this patch and not have to set up a cluster? Because it gives same results as a cluster does? Thanks [~fenghh] A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792430#comment-13792430 ] Feng Honghua commented on HBASE-8755: - [~stack]: Agree. Seems HLogPE is enough for profiling/evaluating, let us run YCSB on a cluster for later double verification? :-) A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792433#comment-13792433 ] Hangjun Ye commented on HBASE-8755: --- To clarify, a hdfs cluster is still needed to store the sequence file. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792279#comment-13792279 ] Liu Shaohui commented on HBASE-8755: [~stack] [~fenghh][~yehangjun] We redo the same HogPE comparision test using hbase trunk. Results are followed. Test env: hdfs: cdh 4.1.0, five datanode, each node has 12 sata disks. hbase: hbase trunk 0.97 r1516083 (for in r1516084, trunk update the protobuf to 2.5 which is incompatible with protobuf 2.4 used in cdh 4.1.0) HLogPE is run on one of these datanodes, so one replica of hlog's block will be at local datanode. ||Thread number ||Time without Patch(s) || Ops without Patch || Time with Patch(s) ||Ops with Patch || Time diff %|| Ops diff % || |1|580.309|1723.22|624.709|1600.745|-7.65|-7.11| |1|591.177|1691.541|631.34|1583.932|-6.79|-6.36| |1|591.948|1689.338|634.518|1575.999|-7.19|-6.71| |5|794.034|6296.959|1201.563|4161.247|-51.32|-33.92| |5|781.033|6401.778|1191.776|4195.419|-52.59|-34.46| |5|805.597|6206.577|1187.179|4211.665|-47.37|-32.14| |50|3222.659|15515.139|1815.586|27539.316|43.66|77.50| |50|3191.131|15668.426|1821.956|27443.033|42.91|75.15| |50|3222.407|15516.352|1817.754|27506.473|43.59|77.27| |75|4517.149|16603.393|2024.359|37048.766|55.19|123.14| |75|4498.987|16670.42|2016.899|37185.797|55.17|123.06| |75|4554.122|16468.598|2037.155|36816.051|55.27|123.55| |100|5186.292|19281.598|2147.581|46564.016|58.59|141.49| |100|5181.344|19300.012|2135.768|46821.563|58.78|142.60| |100|5189.396|19270.064|2143.529|46652.039|58.69|142.10| A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792293#comment-13792293 ] stack commented on HBASE-8755: -- [~fenghh] Our results are the same? A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792301#comment-13792301 ] Feng Honghua commented on HBASE-8755: - [~stack]: Yes, seems this patch against trunk has obvious downgrade compared to against 0.94.3(our internal branch) for: 1) 5 threads: withPatch has worse perf than withoutPatch (33% ops downgrade, 4.2K vs. 6.3K) 2) 100 threads: withpatch's perf is about 2.5X of withoutPatch (46.6K vs. 19.2K) A short summary: 1) withoutPatch, the max ops of HLog is less than 20K (19K for trunk and 17K for 0.94.3) 2) withPatch, the max ops of HLog is more than 45K (46K for trunk and 68K for 0.94.3) 3) for trunk, withPatch can have even worse perf than withoutPatch (about 33% downgrade) We'll try to figure out why withPatch performs worse than withoutPatch for trunk, and try to ensure the performance is about equal when stress is low and still keep obvious upgrade when stress is high. :-) [~stack] : would you please redo the test using 75/100 threads to re-confirm whether the ops upgrade matches our tests? (we see 37K vs 16K for 75 threads and 46K vs 19K for 100 threads) [~zjushch] : what version of HBase do you apply the patch to? trunk or 0.94? I wonder if the reason is the same for our difference and stack's A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792313#comment-13792313 ] stack commented on HBASE-8755: -- [~fenghh] will do. Sorry. Meant to do it earlier. Will try to do some profiling too myself to see why worse wen low thread count. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779581#comment-13779581 ] Liu Shaohui commented on HBASE-8755: [~stack] [~fenghh] We redo the comparision test using HogPE. Here are the results: Test env: hdfs: cdh 4.1.0, five datanode, each node has 12 sata disks. hbase: 0.94.3 HLogPE is run on one of these datanodes, so one replica of hlog's block will be at local datanode. The params of HLogPE are: -iterations 100 -keySize 50 -valueSize 100, which are same as stack's tests. {code} for i in 1 5 50 75 100; do for j in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -verify -threads ${i} -iterations 100 -keySize 50 -valueSize 100 log-patch${i}.${j}.txt; grep Summary: log-patch${i}.${j}.txt done; done {code} ||Thread||Count||WithoutPatch||WithPatch|| |1|579.380|625.937|-8.03| |1|580.307|630.346|-8.62| |1|577.853|654.20|-13.21| |5|799.579|785.696|1.73| |5|795.013|780.642|1.80| |5|826.270|781.909|5.36| |50|3290.482|1165.773|64.57| |50|3298.387|1167.992|64.58| |50|3224.495|1154.921|64.18| |75|4450.760|1253.448|71.83| |75|4506.143|1269.806|71.82| |75|4516.453|1245.954|72.41| |100|5561.074|1493.102|73.15| |100|5616.810|1496.263|73.36| |100|5612.268|1468.500|73.83| a, When thread number is 1, we see that the performance of our test is about 40% better than that of stack's test, both in old thread mode and new thread mode. [~stack] what's the hdfs version in your test or are there special configs? b, When thread number is 5, we do not see -50% downgrade. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779607#comment-13779607 ] Liu Shaohui commented on HBASE-8755: update the result table and add ops diff ||Thread number|| Time without Patch || Ops without Patch||Time with Patch || Ops with Patch || Time diff % || Ops diff % || |1|579.38|1725.983|625.937|1597.605|-8.04|-7.44| |1|580.307|1723.226|630.346|1586.43|-8.62|-7.94| |1|577.853|1730.544|654.205|1528.573|-13.21|-11.67| |5|799.579|6253.291|785.696|6363.785|1.74|1.77| |5|795.013|6289.206|780.642|6404.984|1.81|1.84| |5|826.27|6051.291|781.909|6394.606|5.37|5.67| |50|3290.482|15195.343|1165.773|42890|64.57|182.26| |50|3298.387|15158.925|1167.992|42808.516|64.59|182.40| |50|3224.495|15506.304|1154.921|43293.004|64.18|179.20| |75|4450.76|16851.055|1253.448|59834.953|71.84|255.08| |75|4506.143|16643.945|1269.806|59064.141|71.82|254.87| |75|4516.453|16605.951|1245.954|60194.84|72.41|262.49| |100|5561.074|17982.137|1493.102|66974.656|73.15|272.45| |100|5616.81|17803.699|1496.263|66833.172|73.36|275.39| |100|5612.268|17818.107|1468.5|68096.695|73.83|282.18| Time diff = (Time without Patch - Time with Patch) / Time without Patch * 100 Ops diff = (Ops with Patch - Ops without Patch) / Ops without Patch * 100 [~stack] What are the hdfs and hbase version of your test? We may rebo the tests in cluster with same hdfs and hbase versions as yours. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774603#comment-13774603 ] Feng Honghua commented on HBASE-8755: - bq.If I weren't in Germany right with no VPN access I'd offer to load 0.94 with the patch onto one of our test clusters and run some workload on it... [~lhofhansl] Wonder if you now have access to your test clusters, if possible would you please also help do the performance comparison test on your test clusters? (I'm confused why no one else get comparable improvement test result as ours, while we get test results with almost same improvement for multi-tests) A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775651#comment-13775651 ] stack commented on HBASE-8755: -- bq. What was implication of not shutting this? Were tests failing or is this just make-work? I did not save avg ops/s. It was a set amount of work. bq. 2. is the write load against a single node, or five nodes? – to confirm the throughput is per-node or per-cluster(with five nodes) I had a client writing to a WAL hosted in hdfs on a 5-node HDFS cluster. Thanks [~fenghh] A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771714#comment-13771714 ] Hangjun Ye commented on HBASE-8755: --- Thanks [~stack] for doing this experiment! The result looks positive in some way. China is in holiday (Mid-Autumn Festival) until this Saturday, we would check your suggestion after the holiday. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772189#comment-13772189 ] stack commented on HBASE-8755: -- [~fenghh] Happy holidays! I didn't profile to see where time is spent. That would be next I'd say. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770888#comment-13770888 ] stack commented on HBASE-8755: -- Just parking the numbers here. Takes same time to write 5x as much (5M entries by 5 threads) as 1M entries by 1 thread. 50x takes 6x longer (50 threads writing 50M entries) {code} /tmp/log-patch1.1.txt:2013-09-17 21:19:31,495 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=1, iterations=100 took 991.258s 1008.819ops/s /tmp/log-patch1.2.txt:2013-09-17 21:35:04,715 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=1, iterations=100 took 924.881s 1081.220ops/s /tmp/log-patch1.3.txt:2013-09-17 21:51:32,416 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=1, iterations=100 took 979.312s 1021.125ops/s /tmp/log-patch50.1.txt:2013-09-17 23:29:45,188 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=50, iterations=100 took 2960.095s 16891.350ops/s /tmp/log-patch50.2.txt:2013-09-18 00:22:48,849 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=50, iterations=100 took 2924.844s 17094.930ops/s /tmp/log-patch50.3.txt:2013-09-18 01:15:58,646 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=50, iterations=100 took 2927.617s 17078.736ops/s /tmp/log-patch5.1.txt:2013-09-17 22:07:31,712 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=5, iterations=100 took 950.968s 5257.800ops/s /tmp/log-patch5.2.txt:2013-09-17 22:23:39,680 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=5, iterations=100 took 939.312s 5323.045ops/s /tmp/log-patch5.3.txt:2013-09-17 22:39:56,011 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=5, iterations=100 took 947.183s 5278.811ops/s {code} A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771587#comment-13771587 ] stack commented on HBASE-8755: -- I was wrong that HLogPE called doWrite. It calls append as an HRegionServer would. Here are the numbers. They confirm what [~zjushch] found way back up top of this issue. I was running HLogPE against a five node HDFS cluster. The DataNodes were persisting to a fusionio drive so little friction at the drive. Below are seconds elapsed running following: {code} $ for i in 1 5 50; do for j in 1 2 3; do ./bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -verify -threads ${i} -iterations 100 -nocleanup -keySize 50 -valueSize 100 /tmp/log-patch${i}.${j}.txt; done; done {code} ||Thread Count||WithoutPatch||WithPatch||%diff|| |1|991.258|1125.208|-11.90| |1|924.881|1137.754|-18.70| |1|979.312|1142.959|-14.31| |5|950.968|1914.448|-50.32| |5|939.312|1918.188|-51.03| |5|947.183|1939.806|-51.17| |50|2960.095|1918.808|54.26| |50|2924.844|1933.020|51.30| |50|2927.617|1955.358|49.72| So, about 20% slower when single threaded writes. About 50% slower when five threads writing concurrently BUT 50% faster when 50 concurrent threads (our current default). Can we have the best of both worlds somehow where we switch to this new model when high contention? A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770455#comment-13770455 ] stack commented on HBASE-8755: -- Trying this on hdfs. It takes WAY longer. Threads stuck here: {code} t1 prio=10 tid=0x7f49fd4fc800 nid=0xfc4 in Object.wait() [0x7f49d0d99000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:1795) - locked 0x000423dd1cc8 (a java.util.LinkedList) at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1689) at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1582) at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1567) at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116) at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:135) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1072) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:1195) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.append(FSHLog.java:910) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.append(FSHLog.java:844) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.append(FSHLog.java:838) at org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation$HLogPutBenchmark.run(HLogPerformanceEvaluation.java:110) at java.lang.Thread.run(Thread.java:662) {code} Here are numbers I have so far for WITHOUT patch: /tmp/log-patch1.1.txt:2013-09-17 21:19:31,495 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=1, iterations=100 took 991.258s 1008.819ops/s /tmp/log-patch1.2.txt:2013-09-17 21:35:04,715 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=1, iterations=100 took 924.881s 1081.220ops/s /tmp/log-patch1.3.txt:2013-09-17 21:51:32,416 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=1, iterations=100 took 979.312s 1021.125ops/s /tmp/log-patch5.1.txt:2013-09-17 22:07:31,712 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=5, iterations=100 took 950.968s 5257.800ops/s /tmp/log-patch5.2.txt:2013-09-17 22:23:39,680 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=5, iterations=100 took 939.312s 5323.045ops/s Will clean up later but write rate is constant whether 1 or 5 threads. Will see when 50. Looks like I need to mode the HLogPE. It calls FSHLog#doWrite directory which is not as interesting since is by-passes locks in FSHLog when it does not call append. Will be back. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761563#comment-13761563 ] Feng Honghua commented on HBASE-8755: - Thanks [~stack]. Looking forward to your test result on hdfs A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761065#comment-13761065 ] stack commented on HBASE-8755: -- Staring in on taking a look at this. I tried HLogPerformanceEvaluation. Going via ycsb would seem to add noise and indirection. I ran with 1, 5, and 50 threads w/ sizes that are like Honghua's (key of 50 and value of 150). I ran the test like this on trunk using localfs: {code} $ for i in 1 5 50; do for j in 1 2 3; do ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -verify -threads ${i} -iterations 100 -nocleanup -verbose -keySize 50 -valueSize 100 /tmp/log-patch${i}.${j}.txt; done; done {code} Needs fixes over in HBASE-9460 for above to work on localfs. With localfs, the patch is twice as slow. localfs does not support sync so this is probably what makes the difference -- the extra threads do better w/ the dfsclient's stutter around sync. Let me try up on hdfs next. Below are numbers. I ran each test three times: i.e. three times without patch with one thread, then three times with patch and one thread, etc. Each thread did 1M puts. The table is times for the test to complete in seconds so less is better. ||Thread Count||WithoutPatch||WithPatch|| |1|4.895|29.088| |1|4.856|29.544| |1|4.901|29.326| |5|24.173|53.974| |5|24.013|55.976| |5|23.390|55.858| |50|253.773|454.147| |50|247.095|443.215| |50|254.044|449.043| A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: Performance, wal Reporter: Feng Honghua Assignee: stack Priority: Critical Fix For: 0.96.1 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721960#comment-13721960 ] Jean-Marc Spaggiari commented on HBASE-8755: Ok. I did some other tries and here are the results. jmspaggi@hbasetest:~/hbase/hbase-$ cat output-1.1.2.txt 421428.8 jmspaggi@hbasetest:~/hbase/hbase-$ cat output-1.1.2-8755.txt 427172.1 jmspaggi@hbasetest:~/hbase/hbase-$ cat output-1.2.0.txt 419673.3 jmspaggi@hbasetest:~/hbase/hbase-$ cat output-1.2.0-8755.txt 432413.9 This is elapse time. Between each iteration I totally delete (rm -rf) the hadoop directories, stop all the java processes, etc. Test is 10M randomWrite. So unfortunately I have not been able to see any real improvement. For YCSB, any specific load I should run to be able to see something better that without 8755? I guess it's a write intensive load that we want? Also, I have tested this on a pseudo-distributed instance (no more a standalone one), but I can dedicate 4 nodes to a test if required... A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: wal Reporter: Feng Honghua Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721973#comment-13721973 ] Feng Honghua commented on HBASE-8755: - [~jmspaggi], thanks for your test, some questions about your test: Is it against real HDFS? how many data-nodes and RS? what's the write pressure(client number, write thread number)? what's the total throughput you get? Yes this jira aims for throughput improvement under write intensive load. It should be tested and verified under write intensive load against real cluster / HDFS environment. And as you can see this jira only refactors the write thread model rather than tuning any write sub-phase along the whole write path for any individual write request, no obvious improvement is expected for low/ordinary write pressure. If you have a real cluster environment with 4 data-nodes, it would be better to re-do the test chunhui/I did with the similar test configuration/load which are listed in detail in above comments. 1 client with 200 write threads is OK for pressing a single RS and 4 clients each with 200 write threads for pressing 4 RS. Thanks again. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: wal Reporter: Feng Honghua Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722098#comment-13722098 ] Jean-Marc Spaggiari commented on HBASE-8755: Hi [~fenghh], I ran in pseudo-distributed for all the tests, which mean HBase again a real HDFS. Tests with 1.1.2 and 1.2.0. The test was a 10 million randomWrite test. Ran 10 of them. I will re-run some tests again a single node (still in the process to install the OS on the other one) and will but more load against it. Like you said, 200 threads doing 100 000 writes eaches... More to come... A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: wal Reporter: Feng Honghua Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718252#comment-13718252 ] Jean-Marc Spaggiari commented on HBASE-8755: Hi [~fenghh], I was in a process to give it a try, is there a new version for 0.94 coming? Or the one attached in the JIRA is already the good one? A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: wal Reporter: Feng Honghua Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718270#comment-13718270 ] Feng Honghua commented on HBASE-8755: - [~jmspaggi] HBASE-8755-0.94-V1.patch is good. Let me know if any problem. Thanks Jean-Marc A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: wal Reporter: Feng Honghua Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711711#comment-13711711 ] Tianying Chang commented on HBASE-8755: --- [~zjushch] In your test, you use 300 RPC handler. Since default is 10, and we use 50 in our production cluster, when do we really need that many RPC handler? Will that be more of overhead than benefit? Do you see the write throughput up when increase the # of handler? Any particular scenario where high # of handler help? A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: wal Reporter: Feng Honghua Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708433#comment-13708433 ] Jean-Marc Spaggiari commented on HBASE-8755: Hi [~fenghh], the serves dedicated for those tests were to much different so I think it will have not been a good idea to run on them (difficult to interpret the results). So I just bought 3 absolutely identical nodes (the same as one I already have). By the end of the week I will have 4 servers with the same MB, same CPU, same memory (branch, PN, etc.) and same hard drive! Master will have a 1xSSD+1xSATA, others will have 2xSATA. To start. I will run the YCSB on that with and without this patch as soon as I get the hardware and I install the OS. Should be sometime later this week. I will also see if I can run PE. More to come. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: wal Reporter: Feng Honghua Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708473#comment-13708473 ] Feng Honghua commented on HBASE-8755: - Thanks a lot [~jmspaggi], looking forward to your result. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: wal Reporter: Feng Honghua Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708189#comment-13708189 ] Feng Honghua commented on HBASE-8755: - [~jmspaggi], what's your result of running YCSB against real cluster environment? A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: wal Reporter: Feng Honghua Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13691138#comment-13691138 ] Jean-Marc Spaggiari commented on HBASE-8755: It was PerformanceEvaluation and not YCSB. And I ran it with only one thread. It's now running with 10 threads. Will have the results tomorrow I think, or maybe by end of day if it's going fast enought. I will run YCSB after that, but not sure it's relevant in standalone mode. I will still try. And very soon add another dedicated test node... A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: wal Reporter: Feng Honghua Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13691178#comment-13691178 ] Hangjun Ye commented on HBASE-8755: --- It possibly doesn't have significant difference under standalone mode. The point of this patch is to avoid heavy race conditions of multiple handler threads, we found many race conditions happened in HDFS client (when many threads called write/hflush concurrently) We tested on a single RS with a 5 data-node underlying HDFS cluster using YCSB, would you like to have a try on a similar environment? A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: wal Reporter: Feng Honghua Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690238#comment-13690238 ] Lars Hofhansl commented on HBASE-8755: -- I ran some local tests, and even with a single threaded client I did not see a performance degradation with this patch (0.94). I inserted 10m small rows. The cluster consists of a single RS/DN, HDFS is backed by a SSD. So the HDFS cost of writes should be low, and that in turns should bring out any new inefficiencies introduced by this patch. So from my angle this patch is good. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: wal Reporter: Feng Honghua Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690259#comment-13690259 ] Jean-Marc Spaggiari commented on HBASE-8755: Here are my results: ||Test||Trunk||8755|| |org.apache.hadoop.hbase.PerformanceEvaluation$FilteredScanTest|437101.8|431561.7| |org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest|774306.8|768559| |org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test|21999.7|22277.6| |org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest|134429.3|134370.9| |org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest|112814.7|125324.2| |org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest|78574.42|80064.5| It's time. so the smaller, the better. So overall, it's pretty the same results, except for the random write which are 11% slower. |org.apache.hadoop.hbase.PerformanceEvaluation$FilteredScanTest|1.27%| |org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest|0.74%| |org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test|-1.26%| |org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest|0.04%| |org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest|-11.09%| |org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest|-1.90%| I will redo the RandomWrite tests and try them with 1, 10 and 100 thread to see if only mono-threaded access is impacted... A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: wal Reporter: Feng Honghua Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690288#comment-13690288 ] Lars Hofhansl commented on HBASE-8755: -- Interesting. For writing the WAL it should not make any difference whether we write sequentially or at random. A possible reason is that the batching there is less efficient, maybe that brings out the extra thread hand off needed here. (my test was the equivalent of a sequential write, but using a handcoded test) A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: wal Reporter: Feng Honghua Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690375#comment-13690375 ] stack commented on HBASE-8755: -- [~jmspaggi] Thanks boss. What test is this? YCSB w/ how many clients? Thank you sir. A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: wal Reporter: Feng Honghua Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
[ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688936#comment-13688936 ] Feng Honghua commented on HBASE-8755: - [~zjushch] We run the same tests as yours, and below are the result: 1). One YCSB client with 5/50/200 write threads respectively 2). One RS with 300 RPC handlers, 20 regions (5 data-nodes back-end HDFS running CDH 4.1.1) 3). row-size = 150 bytes threads row-count new-throughputnew-latency old-throughput old-latency --- 5 20 3191 1.551(ms) 3172 1.561(ms) 50 200 23215 2.131(ms) 7437 6.693(ms) 200 200 35793 5.450(ms) 10816 18.312(ms) --- A). the difference is negligible when 5 threads of YCSB client B). new-model still has 3X+ improvement compared to old-model when threads are 50/200 Anybody else can help do the similar tests using the same test configuration as Chunhui? A new write thread model for HLog to improve the overall HBase write throughput --- Key: HBASE-8755 URL: https://issues.apache.org/jira/browse/HBASE-8755 Project: HBase Issue Type: Improvement Components: wal Reporter: Feng Honghua Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch In current write model, each write handler thread (executing put()) will individually go through a full 'append (hlog local buffer) = HLog writer append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and flushLock. The only optimization where checking if current syncTillHere txid in expectation for other thread help write/sync its own txid to hdfs and omitting the write/sync actually help much less than expectation. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement for throughput (from 17000 to 7+). I apply this new write thread model in HLog and the performance test in our test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance test results if anyone is interested. The change for new write thread model is as below: 1 All put handler threads append the edits to HLog's local pending buffer; (it notifies AsyncWriter thread that there is new edits in local buffer) 2 All put handler threads wait in HLog.syncer() function for underlying threads to finish the sync that contains its txid; 3 An single AsyncWriter thread is responsible for retrieve all the buffered edits in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher thread that there is new writes to hdfs that needs a sync) 4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases) 5 An single AsyncNotifier thread is responsible for notifying all pending put handler threads which are waiting in the HLog.syncer() function 6 No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads do the same job it does) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira