[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2014-01-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882220#comment-13882220
 ] 

Hudson commented on HBASE-8755:
---

FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #65 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/65/])
HBASE-10156 FSHLog Refactor (WAS - Fix up the HBASE-8755 slowdown when low 
contention) (stack: rev 1561450)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FailedLogCloseException.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FailedSyncBeforeLogCloseException.java
* /hbase/trunk/hbase-server/pom.xml
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogFactory.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogUtil.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/RingBufferTruck.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCoprocessorHost.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestParallelPut.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestDurability.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollingNoCluster.java
* /hbase/trunk/pom.xml


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2014-01-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882215#comment-13882215
 ] 

Hudson commented on HBASE-8755:
---

SUCCESS: Integrated in HBase-TRUNK #4856 (See 
[https://builds.apache.org/job/HBase-TRUNK/4856/])
HBASE-10156 FSHLog Refactor (WAS - Fix up the HBASE-8755 slowdown when low 
contention) (stack: rev 1561450)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FailedLogCloseException.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FailedSyncBeforeLogCloseException.java
* /hbase/trunk/hbase-server/pom.xml
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSWALEntry.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogFactory.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogKey.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogUtil.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/RingBufferTruck.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SyncFuture.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCoprocessorHost.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEdit.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHLogRecordReader.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestParallelPut.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestDurability.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollingNoCluster.java
* /hbase/trunk/pom.xml


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is 

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2014-01-09 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867544#comment-13867544
 ] 

ramkrishna.s.vasudevan commented on HBASE-8755:
---

Nice one.  

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847690#comment-13847690
 ] 

stack commented on HBASE-8755:
--

I did a compare over three hours and throughput flips in favor of this patch 
the longer we run (15% more writes after three hours)

Let me commit.  This patch makes for better throughput when under high 
contention, cleans up the code, and lays groundwork for multiwal with its 
detaching syncing from handlers but I don't like the slowdown at low numbers.  
Let us fix that promptly.  I created a subtask to address this and assigned 
myself.


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-13 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847698#comment-13847698
 ] 

Andrew Purtell commented on HBASE-8755:
---

Yes [~stack], let's get this out into a near release so we can see how it holds 
up. If we see perf problems during RC evaluation, we can revert if necessary.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.99.0

 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847958#comment-13847958
 ] 

Hudson commented on HBASE-8755:
---

SUCCESS: Integrated in HBase-TRUNK #4722 (See 
[https://builds.apache.org/job/HBase-TRUNK/4722/])
HBASE-8755 A new write thread model for HLog to improve the overall HBase write 
throughput (stack: rev 1550778)
* /hbase/trunk/hbase-common/src/main/resources/hbase-default.xml
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestDurability.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848055#comment-13848055
 ] 

Hudson commented on HBASE-8755:
---

SUCCESS: Integrated in HBase-0.98 #11 (See 
[https://builds.apache.org/job/HBase-0.98/11/])
HBASE-8755 A new write thread model for HLog to improve the overall HBase write 
throughput (stack: rev 1550782)
* /hbase/branches/0.98/hbase-common/src/main/resources/hbase-default.xml
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestDurability.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848092#comment-13848092
 ] 

Hudson commented on HBASE-8755:
---

SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #8 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/8/])
HBASE-8755 A new write thread model for HLog to improve the overall HBase write 
throughput (stack: rev 1550782)
* /hbase/branches/0.98/hbase-common/src/main/resources/hbase-default.xml
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestDurability.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848207#comment-13848207
 ] 

Hudson commented on HBASE-8755:
---

SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #5 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/5/])
HBASE-8755 A new write thread model for HLog to improve the overall HBase write 
throughput (stack: rev 1550778)
* /hbase/trunk/hbase-common/src/main/resources/hbase-default.xml
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestDurability.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846412#comment-13846412
 ] 

stack commented on HBASE-8755:
--

All the Async* are just hanging out waiting.  Missed notification or no 
notification?

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846583#comment-13846583
 ] 

stack commented on HBASE-8755:
--

I notice a syncer exited (it is missing from the thread dump).

2013-12-11 22:40:28,887 INFO  
[regionserver60020-AsyncHLogSyncer3-1386830380159] wal.FSHLog: 
regionserver60020-AsyncHLogSyncer3-1386830380159 exiting

On a new run, again a thread exits:

2013-12-12 10:34:34,620 DEBUG [regionserver60020.logRoller] 
regionserver.LogRoller: HLog roll requested
2013-12-12 10:34:35,526 DEBUG [regionserver60020.logRoller] wal.FSHLog: 
cleanupCurrentWriter  waiting for transactions to get synced  total 37240 
synced till here 37210
2013-12-12 10:34:36,560 INFO  
[regionserver60020-AsyncHLogSyncer1-1386873205908] wal.FSHLog: 
regionserver60020-AsyncHLogSyncer1-1386873205908 exiting


Let me try and figure the why.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846657#comment-13846657
 ] 

stack commented on HBASE-8755:
--

Catching all exceptions, I got a NPE.

2013-12-12 11:03:43,870 INFO  [Thread-14] regionserver.DefaultStoreFlusher: 
Flushed, sequenceid=2680455, memsize=129.3 M, hasBloomFilter=true, into tmp 
file 
hdfs://c2020.halxg.cloudera.com:8020/hbase/data/default/usertable/8dcf9c17c090f476346c8a31e4c9eddb/.tmp/64aaf14b38224f4fbce0a999f92dd8f4
2013-12-12 11:03:43,879 INFO  
[regionserver60020-AsyncHLogSyncer2-1386874930310] wal.FSHLog: UNEXPECTED
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1205)
at java.lang.Thread.run(Thread.java:744)

Writer is null.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846984#comment-13846984
 ] 

Hadoop QA commented on HBASE-8755:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12618499/8755v8.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:red}-1 hadoop1.0{color}.  The patch failed to compile against the 
hadoop 1.0 profile.
Here is snippet of errors:
{code}[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) 
on project hbase-server: Compilation failure: Compilation failure:
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java:[436,42]
 unclosed string literal
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java:[436,63]
 ';' expected
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java:[437,17]
 illegal start of expression
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java:[437,23]
 ';' expected
[ERROR] - [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) 
on project hbase-server: Compilation failure
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
--
Caused by: org.apache.maven.plugin.CompilationFailureException: Compilation 
failure
at 
org.apache.maven.plugin.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:729)
at org.apache.maven.plugin.CompilerMojo.execute(CompilerMojo.java:128)
at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
... 19 more{code}

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8148//console

This message is automatically generated.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new 

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846994#comment-13846994
 ] 

stack commented on HBASE-8755:
--

This is what I'll commit.  I've been running it on small cluster this afternoon 
and after fixing hardware, it seems to run fine at about the same speed as what 
we have currently (ycsb read/write loading).

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846997#comment-13846997
 ] 

stack commented on HBASE-8755:
--

32 threads on one node writing a cluster of 4 nodes (~8 threads per server 
which according to our tests to date shows this model running slower than what 
we have).  It does 10% less throughput after ~25minutes.  We need to get the 
other speedups in after this goes in.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847069#comment-13847069
 ] 

Hadoop QA commented on HBASE-8755:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12618509/8755v9.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//console

This message is automatically generated.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed 

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-12 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847194#comment-13847194
 ] 

Feng Honghua commented on HBASE-8755:
-

[~stack] thanks. seems no further blocking issue?

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-12 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847195#comment-13847195
 ] 

Feng Honghua commented on HBASE-8755:
-

bq.It does 10% less throughput after ~25minutes. 
== it's normal when flush/compact occurs after ~25 minutes write, we saw such 
level downgrade when doing long time tests for both with/without patch.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846090#comment-13846090
 ] 

stack commented on HBASE-8755:
--

Pardon me. This is taking a while; hardware issues and now I trunk seems to 
have issue where it hangs syncing, pre-patch I believe... investigating.

Here is what I see. Lots of threads BLOCKED here:
{code}
RpcServer.handler=0,port=60020 daemon prio=10 tid=0x012f1800 
nid=0x3cb5 waiting for monitor entry [0x7fdb0eb55000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.append(FSHLog.java:1006)
- waiting to lock 0x000456c00390 (a java.lang.Object)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.appendNoSync(FSHLog.java:1054)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2369)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2087)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2037)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2041)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4175)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3424)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3328)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:28460)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
at java.lang.Thread.run(Thread.java:744)
{code}

Then the fella w/ the lock is doing this:

{code}
regionserver60020.logRoller daemon prio=10 tid=0x01159800 nid=0x3ca7 
in Object.wait() [0x7fdb0f964000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1307)
- locked 0x000456bf37a8 (a java.util.concurrent.atomic.AtomicLong)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1299)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:1412)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.cleanupCurrentWriter(FSHLog.java:760)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:566)
- locked 0x000456c00390 (a java.lang.Object)
- locked 0x000456c00330 (a java.lang.Object)
at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:96)
at java.lang.Thread.run(Thread.java:744)
{code}

Server is bound up.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write 

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846107#comment-13846107
 ] 

stack commented on HBASE-8755:
--

Hmm... Happens when this patch is in place.  Stuck here:

{code}
regionserver60020.logRoller daemon prio=10 tid=0x7f6f08822800 nid=0x5b0a 
in Object.wait() [0x7f6eeccef000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1304)
- locked 0x00045756db98 (a java.util.concurrent.atomic.AtomicLong)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1296)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:1409)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.cleanupCurrentWriter(FSHLog.java:759)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:565)
- locked 0x00045756dc70 (a java.lang.Object)
- locked 0x00045756dc10 (a java.lang.Object)
at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:96)
at java.lang.Thread.run(Thread.java:744)
{code}

Which is here:

1294   // sync all known transactions
1295   private void syncer() throws IOException {
1296 syncer(this.unflushedEntries.get()); // sync all pending items
1297   }
1298
1299   // sync all transactions upto the specified txid
1300   private void syncer(long txid) throws IOException {
1301 synchronized (this.syncedTillHere) {
1302   while (this.syncedTillHere.get()  txid) {
1303 try {
1304   this.syncedTillHere.wait();
1305
1306   if (txid = this.failedTxid.get()) {
1307 assert asyncIOE != null :
1308   current txid is among(under) failed txids, but asyncIOE is 
null!;
1309 throw asyncIOE;
1310   }
1311 } catch (InterruptedException e) {
1312   LOG.debug(interrupted while waiting for notification from 
AsyncNotifier);
1313 }
1314   }
1315 }
1316   }

All other threads are trying to do an appendnosync:

{code}
RpcServer.handler=0,port=60020 daemon prio=10 tid=0x7f6f08a26800 
nid=0x5b1b waiting for monitor entry [0x7f6eebee1000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.append(FSHLog.java:1005)
- waiting to lock 0x00045756dc70 (a java.lang.Object)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.appendNoSync(FSHLog.java:1053)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2369)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2087)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2037)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2041)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4175)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3424)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3328)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:28460)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
at java.lang.Thread.run(Thread.java:744)
{code}

... but can't make progress blocked on updateLock.

{code}
 995   private long append(HRegionInfo info, TableName tableName, WALEdit 
edits, ListUUID clusterIds,
 996   final long now, HTableDescriptor htd, boolean doSync, boolean 
isInMemstore,
 997   AtomicLong sequenceId, long nonceGroup, long nonce) throws 
IOException {
 998   if (edits.isEmpty()) return this.unflushedEntries.get();
 999   if (this.closed) {
1000 throw new IOException(Cannot append; log is closed);
1001   }
1002   TraceScope traceScope = Trace.startSpan(FSHlog.append);
1003   try {
1004 long txid = 0;
1005 synchronized (this.updateLock) {
1006   // get the sequence number from the passed Long. In normal flow, 
it is coming from the
1007   // region.
1008   long seqNum = sequenceId.incrementAndGet();
...
{code}

The update lock is held when rolling log here:

 562 synchronized (updateLock) {
 563   // Clean up 

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846110#comment-13846110
 ] 

stack commented on HBASE-8755:
--

Will provide more on this tomorrow.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-11 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846130#comment-13846130
 ] 

Feng Honghua commented on HBASE-8755:
-

The stack-traces are ok: when log-roll occurs, it holds the updateLock to 
prevent any subsequent writer handler put edits to pendingWrites (that's why 
all writer handler threads pend on updateLock), and then it calls *sync* to 
wait for Async* threads to sync all edits currently in pendingWrites (that's 
why logroller pend on sync())...
Why no progress, we need to see why Async* threads don't finish the sync of 
current pendingWrites, would you please provide the Async* threads' stack 
traces? 
Only AsyncWriter from Async* threads needs the pendingWritesLock to grab the 
edits from pendingWrites, and AsyncNotifier needs syncTillHere to update it and 
notifyAll, these two both are hold-able under current situation: write handler 
threads can't hold pendingWritesLock before holding updateLock(within 
append()), logroller doesn't hold pendingWritesLock at all, and logroller is 
waiting for syncTillHere's change...

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844108#comment-13844108
 ] 

Hadoop QA commented on HBASE-8755:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12617842/HBASE-8755-trunk-v6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//console

This message is automatically generated.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change 

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-10 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844827#comment-13844827
 ] 

stack commented on HBASE-8755:
--

All sounds good above [~fenghh]  I'm running a few tests here on cluster to see 
it basically works.  Any other comments by anyone else?  Otherwise, was 
planning on committing.  We can work on further speedup in new issues; e.g. see 
if we can do less threads as per [~himan...@cloudera.com]

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844838#comment-13844838
 ] 

Ted Yu commented on HBASE-8755:
---

+1 from me.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-10 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844839#comment-13844839
 ] 

Jonathan Hsieh commented on HBASE-8755:
---


I did most of a review of v4 last week -- here are a few nits:

nit:  (fix on commit)
{code}
+  //up and holds the lock
+  // NOTE! can't hold 'upateLock' here since rollWriter will pend
+  // on 'sync()' with 'updateLock', but 'sync()' will wait for
+  // AsyncWriter/AsyncSyncer/AsyncNotifier series. without upateLock
+  // can leads to pendWrites more than pendingTxid, but not problem
{code}
spelling: upate - update


This can go in a follow up issue -- and please add a description of the threads 
/ queues / invariants and  how a wal writes happens in the class javadoc. An 
updated version of the 1-6 list in the description would be great.



Good stuff [~fenghh]!

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-10 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844859#comment-13844859
 ] 

Himanshu Vashishtha commented on HBASE-8755:


Thanks for the explanation [~fenghh]; and it pretty much answers all my 
questions. Also, looking more, getting rid of LogSyncer thread eases out the 
locking semantics of rolling.

Yes, I reran the above experiments on a more standard environment (4 DNs with 
HLogPE running on a DN, and log level set to INFO instead of Debug), and got 
mixed results this time. Varied threads from 2 to 100 and didn't get a clear 
winner. Given the current state of this patch and the cleanup it does, I am +1 
for committing this.

Looking forward to it. Thanks.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-10 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845035#comment-13845035
 ] 

Feng Honghua commented on HBASE-8755:
-

Thanks [~jmhsieh], I've made and attached a new patch based on your comment.
Thanks everyone again for your valuable comment and feedback.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-10 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845038#comment-13845038
 ] 

Feng Honghua commented on HBASE-8755:
-

[~v.himanshu]:

Looking forward to your performance comparison test result (trunk, 
with-this-patch, with-syncer-only) using latest HLogPE with new 
'appendWithoutSync + sync' logic. And I'll also try to do the same test for 
double comparison/confirm.

Performance improvement is the first and foremost goal of this patch, code 
cleanup is a just by-the-way side-effect, so we want to see this patch 
accepted/checked-in because of the performance improvement it brings, but 
because of the code cleanup it does :-)

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845103#comment-13845103
 ] 

Hadoop QA commented on HBASE-8755:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12618171/HBASE-8755-trunk-v7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//console

This message is automatically generated.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if 

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-10 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845106#comment-13845106
 ] 

stack commented on HBASE-8755:
--

I'm running some tests local just to make sure.  Will report back...

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-09 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843224#comment-13843224
 ] 

Feng Honghua commented on HBASE-8755:
-

[~stack] : thanks for the comment, below are the comments after corresponding 
change. a new patch based on [~v.himanshu]'s latest v5 patch is attached(thanks 
[~v.himanshu])
bq.Remove these asserts rather than comment them out given they depended on a 
facility this patch removes.
== done
bq.using a Random for choosing an arbitrary thread for a list of 4 is 
heavyweight
== done
bq.Please remove all mentions of AsyncFlush since it no longer exists
== done
bq.Is this comment right? // txid = failedTxid will fail by throwing asyncIOE; 
 Should it be = failedTxid?
== this comment is right: txid larger than failedTxid isn't sync-ed by the one 
that notifies failedTxid. but txid smaller than or equal to failedTxid is (not 
must be, but since we don't maintain a txid range to syncer mapping, so we fail 
all txid smaller than or equal to failedTxid, this aligns with HBase's write 
semantic of 'failed write may succeed in fact'. this is a point we can refine 
later on by adding txid range to sync operation mapping to precisely indicate 
failure)
bq.This should be volatile since it is set by AsyncSync and then used by the 
main FSHLog thread (you have an assert to check it not null – maybe you ran 
into an issue here already?):   + private IOException asyncIOE = null;
== done
bq.'bufferLock' if a very generic name. Could it be more descriptive? It is a 
lock held for a short while while AsyncWriter moves queued edits off the 
globally seen queue to a local queue just before we send the edits to the WAL. 
You add a method named getPendingWrites that requires this lock be held. Could 
we tie the method and the lock together better? Name it pendingWritesLock? (The 
name of the list to hold the pending writes is pendingWrites).
== done
bq. (because the HDFS write-method is pretty heavyweight as far as locking is 
concerned.) I think the heavyweight referred to in the above is hbase 
locking...please adjust the comment
== done
bq.Comments on what these threads do will help the next code reader
== done
bq.Your patch should remove the optional flush config from hbase-default.xml 
too since it no longer is relevant
== done
bq.A small nit is you might look at other threads in hbase and see how they are 
named...It might be good if these better align
== done
bq.Probably make the number of asyncsyncers a configuration
== done
bq.but we do not seem to be doing it on the other call to doWrite at around 
line #969 inside in append
== doWrite is called inside append, and the bufferLock(now renamed to 
pendingWritesLock) is held there
bq.This method(setPendingTxid) is only called at close time if I read the patch 
right
== it's called inside append() once doWrite() is done to notify AsyncWriter 
there are new pendingWrites to write to HDFS, it's not called at close time. 
you can double check it:-)
bq.Is this 'fatal'? Or is it an 'error' 
== done
bq.and request a log roll yet we carry on to try and sync, an op that will 
likely fail? We are ok here? We updated the write txid but not the sync txid so 
that should be fine.
== we can't retry to sync after a log roll since we can't sync to a new hlog 
while the writes were written to the old hlog. we failed all the transactions 
with txid = write txid, it's ok here.
bq.Do we need this: if (!asyncSyncers[ i ].isSyncing()) DFSClient will allow us 
call sync concurrently. I think DFSClient allow us call sync 
concurrently...HDFS will handle(synchronize) concurrent sync?
bq.Can these be static classes or do they need context form the hosting FSHLog?
== they all need context from hosting FSHLog (such as 
writer/asyncWriter/asyncSyncer/asyncNotifier)
bq.These method names should not talk about 'flush'. They should be named 
'sync' instead. Same for the flushlock.
== done
bq.Why atomic boolean and not just a volatile here? private AtomicBoolean 
isSyncing = new AtomicBoolean(false);
== done
bq.The above is very important. All your threads do this?
== yes
bq.It talks about writeChunk being expensive but we are not doing anything to 
ameliorate dfsclient writes if I dig down into our log writer
== done (remove it)

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, 

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-09 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843227#comment-13843227
 ] 

Feng Honghua commented on HBASE-8755:
-

I'll read and update according to [~v.himanshu]'s comment tomorrow. thanks.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-09 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843231#comment-13843231
 ] 

Jean-Marc Spaggiari commented on HBASE-8755:


For the random, you can also use something like System.currentTimeMillis() % 
asyncSyncers.length; Not saying that yours is not correct ;)

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-09 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843925#comment-13843925
 ] 

Feng Honghua commented on HBASE-8755:
-

[~v.himanshu]: 
bq.1) log rolling thread safety: Log rolling happens in parallel with 
flush/sync. Currently, in FSHLog, sync call grabs the updateLock to ensure it 
has a non-null writer (because of parallel log rolling). How does this patch 
address the non-null writer? Or is it not needed anymore? Also, if you go for 
the updatelock in sync, that might result in deadlock.
== Good question:-). in rollWriter(), before switching this.writer to the 
newly created writer, *updateLock is held* and *cleanupCurrentWriter() is 
called*.  *updateLock is held* guarantees no new edits enters pendingWrites, 
and *cleanupCurrentWriter() is called* guarantees all edits in current 
pendingWrites must be written to hdfs and sync-ed(inside this method 'sync()' 
is called to provide this guarantee). This means when switching this.writer to 
newly HLog writer, no new edits enter and all current edits have already been 
sync-ed to hdfs, all AsyncWriter/AsyncSyncer threads have nothing to do and are 
idle, so no log rolling thread safety issue here
bq.2) Error handling: It is not very clear how is flush/sync failures are being 
handled?... Let's say there are two handlers waiting for sync, t1 on txid 8 and 
t2 on txid 10. And, t1 wakes up on notification. Would t1 also get this 
exception? Wouldn't it be wrong, because txid 8 may have succeeded? Please 
correct me if I missed anything.
== your understanding here is correct, but the write semantic of HBase is 
'successful write response means a successful write, but failed write response 
can mean either a successful write or a failed write', right? and I have 
already mentioned this in above comment. this behavior can be improved by 
adding a txid-range to txid mapping, this mapping can help exactly indicate a 
failed txid fail which pending txid-range
bq.3) I think the current HLogPE doesn't do justice to the real use 
case...Almost all HLog calls are appendNoSync, followed by a sync call. In the 
current HLogPE, we are calling append calls, which also does the sync
== you're right, but from an whole view of write process, these two have no 
big difference concerning performance: append() calls sync() inside, and in 
real case of HRegion.java, appendNoSync and sync is called, since sync now is a 
pending on notification, the difference of these two behavior is sooner/later 
to pend on notification, no impact on overall write performance(after put edits 
to pendingWrites, write handler thread can just wait for write/sync to hdfs to 
finish and can't help/influence write performance)...right?
bq.4) Perf numbers are super impressive. It would have been wonderful to have 
such numbers for lesser number of handler threads also (e.g., 5-10 threads). 
IMHO, that represents most common case scenario, but I could be wrong. I know 
this has been beaten to death in the discussions above, but just echoing my 
thoughts here
== I think maybe many guys hold a same point of view as yours here, but I also 
have my personal thought on this:-) Throughput is different from latency: 
latency represents how quickly a system perform user requests, the quicker the 
better; while throughput represents how many requests a system perform user 
requests(concurrently, or to be more accurate, within a given time frame), the 
more the better. these two are both indicate a system's capability for 
performing user requests, but in different angles. certainly for application 
with low-medium write stress less write threads within client to issue requests 
is OK, but for application with high write stress, users/clients may feel bad 
if system just can't serve/reach their real-world throughput, *no matter* how 
many client threads are configured/added. with the improvement of this patch, 
at least we *can* satisfy users' such high throughput requirement by adding 
client threads. And without improving individual write request's latency(as 
this patch does, it does nothing for individual write's latency), it's hard to 
improve throughput for *synchronous* client write thread(it can issue next 
request only after it's done with the current one), that's why this patch has 
less effect for single/less write threads. If HBase supports *asynchronous* 
client write thread, I think this patch can also provide big improvement even 
with less client write threads.
bq.5) I should also mention that while working on a different use case, I was 
trying to bring a layer of indirection b/w regionserver handlers and sync 
operation (Sync is the most costly affair in all HLog story). ...
=== glad to see similar approach:-). wonder if have same improvement under 
heavy write stress(such as 50/100/200 client write threads). Looking forward to 
seeing your final test results:-)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-09 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843926#comment-13843926
 ] 

Feng Honghua commented on HBASE-8755:
-

[~jmspaggi]
bq.For the random, you can also use something like System.currentTimeMillis() % 
asyncSyncers.length
== yeah. System.currentTimeMillis() is a good candidate. txid can do the same 
thing as well. thanks:-)

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-06 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841093#comment-13841093
 ] 

Andrew Purtell commented on HBASE-8755:
---

This work looks pretty far along. If we can get it in soon I would be willing 
to try putting this in to 0.98 so this major improvement can manifest in a 
release. Exciting results.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-05 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840001#comment-13840001
 ] 

Himanshu Vashishtha commented on HBASE-8755:


This is some awesome stuff happening here. 

I have few comments apart from what Stack already mentioned.

1) log rolling thread safety: Log rolling happens in parallel with flush/sync. 
Currently, in FSHLog, sync call grabs the updateLock to ensure it has a 
non-null writer (because of parallel log rolling). How does this patch address 
the non-null writer? Or is it not needed anymore? Also, if you go for the 
updatelock in sync, that might result in deadlock.

2) Error handling: It is not very clear how is flush/sync failures are being 
handled? For example, if a write fails for txid 10, it notifies AsyncSyncer 
that writer is done with txid 10. And, AsyncSyncer notifies the notifier 
thread, which finally notifies the blocked handler using notifyAll. The handler 
checks for the failedTxid here:
{code}
+  if (txid = this.failedTxid.get()) {
{code}
Let's say there are two handlers waiting for sync, t1 on txid 8 and t2 on txid 
10. And, t1 wakes up on notification. Would t1 also get this exception? 
Wouldn't it be wrong, because txid 8 may have succeeded? Please correct me if I 
missed anything.

3) I think the current HLogPE doesn't do justice to the real use case. Almost 
*all* HLog calls are appendNoSync, followed by a sync call.
In the current HLogPE, we are calling append calls, which also does the sync. 
When I changed it to represent the above common case,
the performance numbers of current FSHLog using HLogPE improves quite a bit. I 
still need to figure out the reason, but the HLogPE change affects the perf 
numbers considerably IMO.
For example, on a 5 node cluster, I see this difference on trunk:
Earlier: 
{code}
Summary: threads=3, iterations=10 took 218.590s 1372.432ops/s
{code}
Post HlogPE change:
{code}
Summary: threads=3, iterations=10 took 172.648s 1737.640ops/s
{code}
I think it would be great if we can test this with the correct HLogPE. What do 
you think Fenghua?

4) Perf numbers are super impressive. It would have been wonderful to have such 
numbers for lesser number of handler threads also (e.g., 5-10 threads). IMHO, 
that represents most common case scenario, but I could be wrong. I know this 
has been beaten to death in the discussions above, but just echoing my thoughts 
here. 

5) I should also mention that while working on a different use case, I was 
trying to bring a layer of indirection b/w regionserver handlers and sync 
operation (Sync is the most costly affair in all HLog story). What resulted is 
a separate set of syncer threads, which does the work of flushing the edits and 
syncing to HDFS. This is what it looks like:
a) The handlers append their entries to the FSHLog buffer as they do currently. 
b) They invoke sync API. There, they wait on the Syncer threads to do the work 
for them and notify. 
c) It results in batched sync effort but without much extra locking/threads. 
Basically, it is similar to what you did here but minus Writer  Notifier 
threads and minus the bufferlock.
I mentioned it here because with that approach, I see some perf improvement 
even with lesser number of handler threads. And, it also keeps the current 
deferredLogFlush behavior. This work is still in progress and is still a 
prototype. It would be great to know your take on it.

Here are some numbers on a 5 node cluster; hdfs 2.1.0; hbase is trunk; client 
on a different node. I haven't tested it with larger number of threads but 
would be good to compare I think (its 2am here... ). It uses 3 syncers at the 
moment (and varying it would be a good thing to experiment too). 
Also, without patch in the below table means trunk + HLogPE patch.

||Threads||w/o patch time||w/o patch ops||w/ patch time||w/ patch ops||
|3|172.648s|1737.640ops/s|170.332s|1761.266ops/s|
|3|170.977s|1754.622ops/s|174.568s|1718.528ops/s|
|5|213.738s|2339.313ops/s|191.119s|2616.171ops/s|
|5|211.072s|2368.860ops/s|189.671s|2636.144ops/s|
|10|254.641s|3926.419ops/s|216.494s|4619.319ops/s|
|10|251.503s|3978.564ops/s|215.333s|4643.266ops/s|
|10|251.692s|3970.579ops/s|217.151s|4605.854ops/s|
|20|648.943s|6163.870ops/s|646.279s|6189.277ops/s|
|20|658.654s|6072.991ops/s|656.277s|6094.987ops/s|
|25|282.654s|8861.991ops/s|249.277s|10033.987ops/s|


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, 

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840074#comment-13840074
 ] 

stack commented on HBASE-8755:
--

Nice review [~himan...@cloudera.com]

On 3., above, yes, that is true.  HLogPE does not seem to be representative as 
you suggest.  But does your  change below

{code}
-hlog.append(hri, hri.getTable(), walEdit, now, htd, 
region.getSequenceId());
+// this is how almost all users of HLog use it (all but compaction 
calls).
+long txid = hlog.appendNoSync(hri, hri.getTable(), walEdit, 
clusters, now, htd,
+  region.getSequenceId(), true, nonce, nonce);
+hlog.sync(txid);
+
{code}

... bring it closer to a 'real' use case?  I see over in HRegion that we do a 
bunch of appendNoSync in minibatch or even in put before we call sync.   Should 
we append more than just one set of edits before we call the sync?

I suppose on a regionserver with a load of regions loaded up on it, all these 
syncs can come crashing in on top of each other on to the underlying WAL in an 
arbitary manner -- something Feng Honghua's patch mitigates some by making it 
so syncs are done when FSHLog thinks it appropriate rather than when some 
arbitrary HRegion call thinks it right ... and this is probably part of the 
reason for the perf improvement.

Could we better regulate the sync calls so they are even less arbitrary?  Even 
them out?  It could make for better performance if there was a mechanism 
against syncs clumping together.

Looking at your patch, the syncer is very much like Feng Honghua's -- it is 
interesting that you two independently came up w/ similar multithreaded syncing 
mechanism.  That would seem to 'prove' this is a good approach.  Feng's patch 
is much further along with a bunch of cleanup of FSHLog.  Will wait on his 
comments on what he thinks of doing without AsyncWriter and AsyncNotifier.

Looks like your patch is far enough long for us to do tests comparing the 
approaches? 



 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-05 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840663#comment-13840663
 ] 

Jonathan Hsieh commented on HBASE-8755:
---

I posted v4 for trunk on reviewboard and will reviewing there.

https://reviews.apache.org/r/16052/

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840710#comment-13840710
 ] 

stack commented on HBASE-8755:
--

Let me look into changing HLogPE after Himanshu's suggestion above.  I'll add 
an option to batch up edits some.  My sense is that it will make the difference 
between current code and Honghua's patch even larger when the count of threads 
is low but that it will not change the numbers when thread count is high.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-05 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840733#comment-13840733
 ] 

Himanshu Vashishtha commented on HBASE-8755:


I agree on HLogPE comment. Yes, grouping appendNoSync before calling sync is 
the right way to go. The existing one is way-off the real use.

I figured that the w/o patch col in the above table contains Feng's patch too. 
I am re-running the experiments at the moment with three versions: Trunk, Trunk 
+ Feng's patch, Trunk + Syncer's approach. And, all versions have that HLogPe 
fix on it; will report back the numbers once done.

Going by what we are seeing here, batching sync calls is definitely the right 
way IMO.
I agree that Feng Honghua's patch has been tested well enough, and I really 
like the radical cleanup it does. The code reads pretty clean now, though it 
involves far more number of threads and synchronization stuff (which makes it 
more interesting to debug too :)), I just wanted to ensure that it is safe. 

The reason I mentioned this different approach is it adds lesser number of 
threads (while keeping the current behaviour), and also shows improvement with 
smaller number of handlers, which to me looks like a nice win over current 
FSHLog. This is still in a prototype stage, and I absolutely don't want to 
block Feng's superb piece of work here. It would be good to know his thoughts 
on this. Thanks.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-05 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840812#comment-13840812
 ] 

Feng Honghua commented on HBASE-8755:
-

Thanks very much for [~v.himanshu] and [~stack]'s review and further 
improvement suggestion. Currently I have some other stuff on hand to handle... 
I'll try best to come back soon and read through above comments. Thanks again.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839662#comment-13839662
 ] 

stack commented on HBASE-8755:
--

Here is more review on the patch.  Make the changes suggested below and I'll +1 
it.

(Discussion off-line w/ Feng on this issue helped me better understand this 
patch and put to rest any notion that there is an easier 'fix' than the one 
proposed here.  That said.  There is much room for improvement but this can be 
done in a follow-on)

Remove these asserts rather than comment them out given they depended on a 
facility this patch removes.  Leaving them in will only make the next reader of 
the code -- very likely lacking the context you have -- feel uneasy thinking 
someone removed asserts just to get tests to pass.

 8 -assertTrue(Should have an outstanding WAL edit, ((FSHLog) 
log).hasDeferredEntries());
 9 +//assertTrue(Should have an outstanding WAL edit, ((FSHLog) 
log).hasDeferredEntries());

On the below...

+import java.util.Random;

... using a Random for choosing an arbitrary thread for a list of 4 is 
heavyweight.  Can you not take last digit of timestamp or nano timestamp or 
some attribute of the edit instead?  Something more lightweight?

Please remove all mentions of AsyncFlush since it no longer exists:

// all writes pending on AsyncWrite/AsyncFlush thread with

Leaving it in will confuse readers when they can't find any such thread class.

Is this comment right?

// txid = failedTxid will fail by throwing asyncIOE

Should it be = failedTxid?

This should be volatile since it is set by AsyncSync and then used by the main 
FSHLog thread (you have an assert to check it not null -- maybe you ran into an 
issue here already?):

 +  private IOException asyncIOE = null;

bq. +  private final Object bufferLock = new Object();

'bufferLock' if a very generic name. Could it be more descriptive?  It is a 
lock held for a short while while AsyncWriter moves queued edits off the 
globally seen queue to a local queue just before we send the edits to the WAL.  
You add a method named getPendingWrites that requires this lock be held.  Could 
we tie the method and the lock together better? Name it pendingWritesLock?  
(The name of the list to hold the pending writes is pendingWrites).

bq. ...because the HDFS write-method is pretty heavyweight as far as locking is 
concerned.

I think the heavyweight referred to in the above is hbase locking, not hdfs 
locking as the comment would imply.  If you agree (you know this code better 
than I), please adjust the comment.

Comments on what these threads do will help the next code reader.  AsyncWriter 
does adding of edits to HDFS.  AsyncSyncer needs a comment because it is 
oxymoronic (though it makes sense in this context).  In particular, a comment 
would draw out why we need so many instances of a syncer thread because 
everyone's first thought here is going to be why do we need this?  Ditto on the 
AsyncNotifier.  In the reviews above, folks have asked why we need this thread 
at all and a code reader will likely think similar on a first pass.  
Bottom-line, your patch raised questions from reviewers; it would be cool if 
the questions were answered in code comments where possible so the questions do 
not come up again.

  4 +  private final AsyncWriter   asyncWriter;
  5 +  private final AsyncSyncer[] asyncSyncers = new AsyncSyncer[5];
  6 +  private final AsyncNotifier asyncNotifier;

You remove the LogSyncer facility in this patch.  That is good (need to note 
this in release notes).  Your patch should remove the optional flush config 
from hbase-default.xml too since it no longer is relevant.

  3 -this.optionalFlushInterval =
  4 -  conf.getLong(hbase.regionserver.optionallogflushinterval, 1 * 
1000);

I see it here...

hbase-common/src/main/resources/hbase-default.xml:
namehbase.regionserver.optionallogflushinterval/name

A small nit is you might look at other threads in hbase and see how they are 
named... 

3 +asyncWriter = new AsyncWriter(AsyncHLogWriter);

Ditto here:

 +  asyncSyncers[i] = new AsyncSyncer(AsyncHLogSyncer + i);

Probably make the number of asyncsyncers a configuration (you don't have to put 
the option out in hbase-default.xml.. just make it so that if someone is 
reading the code and trips over this issue, they can change it by adding to 
hbase-site.xml w/o having to change code -- lets not reproduce the hard-coded 
'80' that is in the head of dfsclient we discussed yesterday -- smile).

... and here: asyncNotifier = new AsyncNotifier(AsyncHLogNotifier);

Not important but check out how other threads are named in hbase.  It might be 
good if these better align.

Maybe make a method for shutting down all these thread or use the 
Threads#shutdown method in Threads.java?

bq. LOG.error(Exception while waiting for AsyncNotifier threads to die, e);

Do LOG.error(Exception 

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838679#comment-13838679
 ] 

stack commented on HBASE-8755:
--

Sorry for the delay getting back to this [~fenghh] and thanks for the 
explanation.

I am having trouble reviewing the patch because I am trying to understand what 
is going on here in FSHLog.  It is hard to follow (not your patch necessarily 
but what is there currently) in spite of multiple reviews.  I keep trying to 
grok what is going on because this is critical code.

The numbers are hard to argue with and it does some nice cleanup of FSHLog 
which makes it easier to understand.  We could commit this patch and then work 
on undoing the complexity that is rife here; your patch adds yet more because 
it adds interacting threads w/ new synchronizations, notifications, 
AtomicBoolean states, etc., which cost performance-wise but at least it is 
clearer what is going on and we have tools for comparing approaches now.  We 
could work on simplication and removal of sync points in a follow-on (See below 
for a note on one approach).

I now get why the need for multiple syncers.  It is a little counter-intuitiive 
given we want to batch up edits more to get more performance on the one hand, 
but then on the other, we have to sync more often because sync'ing is 
outstanding for too much time, so much time it holds up handlers too long.

+ I am trying to understand why we keep aside the edits in a linked-list.  This 
was there before your time.  You just continue the practice.  The original 
comment says We keep them cached here instead of writing them to HDFS 
piecemeal, because the HDFS write-method is pretty heavyweight as far as 
locking is concerned.Yet, when we eventually flush the edits, we don't do 
anything special; we just call write on the dfsoutputstream.  We are not 
avoiding locking in hdfs.  It must be the hbase flush/update locking that is 
being referred to here.
+ AsyncSyncer is a confounding name for a class -- but it makes sense in this 
context.  The flush object in this thread is a syncer synchronization object 
not for memstore flushes... as I thought it was (there is use of flush in here 
when it probably should be sync to be consistent).

Off-list, a few other lads are interested in reviewing this patch (it is a 
popular patch!)... our [~j...@cloudera.com] and possible 
[~himan...@cloudera.com] because they are getting stuck in this area.  If they 
don't get to it soon, I'll commit unless objection.




 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes 

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-03 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838687#comment-13838687
 ] 

Feng Honghua commented on HBASE-8755:
-

Thanks [~stack], and also the review from [~j...@cloudera.com] and 
[~himan...@cloudera.com], and other experienced guys is welcome, it's better to 
have this patch reviewed by as many guys as possible.

Any question on this patch is welcome :-)

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-25 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832233#comment-13832233
 ] 

Feng Honghua commented on HBASE-8755:
-

tried 4 asyncSyncer threads, below are the results. a bit worse than 5 threads 
but looks like acceptable?
||threads||ops-wo-patch||ops-w-patch||ops-diff||
|1|1716|1572|-8.4%|
|3|3179|3189|+0.3%|
|5|6091|5593|-8.1%|
|10|8760|9450|+7.8%|
|25|13019|18055|+38.7%|
|50|14995|26597|+77.3%|
|100|18824|51441|+173.2%|
|200|18144|61531|+239.1%|

additional explanation on correctness when introducing extra asyncSyncer 
threads:
- when a txid(t0) is notified, all txid smaller than t0 must already be written 
to hdfs and by sync-ed: before t0 is notified, t0 must be sync-ed by an 
asyncSyncer thread; before t0 is sync-ed, t0 must be written to hdfs by 
asyncWriter thread; before t0 is written to hdfs, all txid smaller than t0 must 
be written to hdfs, so the sync of t0 can guarantee all txid smaller than t0 
must be sync-ed (either before the sync of t0, or by the sync of t0)
- when a txid(t0) can't find free(idle) asyncSyncer thread and added to a 
random one, it won't be sync-ed until its asyncSyncer thread is done with the 
txid at hand. but its entries already have been written to hdfs, and if any 
bigger txid than t0 (say t1) is successfully sync-ed by another parallel 
asyncSyncer thread, that sync can guarantee t0 also successfully sync-ed, hence 
when t1 is notified, t0 can also be correctly notified.

any further comments?

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-23 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830678#comment-13830678
 ] 

Feng Honghua commented on HBASE-8755:
-

bq.Do these no longer pass?
= yes, under new thread model, no explicit method to do the sync and can't 
tell if there is outstanding deferred entries (the AsyncWriter/AsyncSyncer 
threads do write/sync in a best-effort way)

bq.We have hard-coded 5 asyncSyncers? Why 5?
= yes, I tried 2/3/5/10 and found 5 is the best number (2/3 have worse perf, 
10 has equal perf but introduces too many extra threads)

bq.If we fail to find a free syncer, i don't follow what is going on w/ 
choosing a random syncer and setting txid as in below
= when fail to find a idle syncer(which is doing sync), choosing a random 
syncer and setting txid that way fall into the same way before introducing 
extra asyncSyncer threads: when asyncWriter pushes new entries to hdfs before 
asyncSyncer sync the previously pushed ones, asyncSyncer gets notified the 
newly pushed txid, but these txid will be synced by next time after asyncSyncer 
is done with the current ones, notice we use txidToFlush to record txid each 
sync is for, and it can't change during each sync, while writtenTxid can change 
during each sync)

To summary: the sync operation is the most time-consuming phase, under old 
write model every write handler issues a separate sync directly for itself(if 
not return early by syncedTillHere). and under new write model, though separate 
threads significantly reduce the lock race, but if concurrent write threads is 
few, the benefit by reducing lock race(fewer write threads, fewer benefit) 
can't offset the inefficiency by using a single asyncSyncer threads(each time 
asyncSyncer thread can only sync for a portion of the writes, but the write 
handlers which already have their entries in buffer or pushed to hdfs also need 
to wait for its completeness, and can't proceed until its next sync phase is 
done)
By introducing extra asyncSyncer threads, the correctness of this model is the 
same as before: still a single asyncWriter thread which push buffered entries 
to hdfs sequentially(txid increases sequentially), and when each asyncSyncer is 
done, it's guaranteed all txids smaller are pushed to hdfs and successfully 
sync-ed.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-23 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830681#comment-13830681
 ] 

Feng Honghua commented on HBASE-8755:
-

bq.= yes, under new thread model, no explicit method to do the sync and can't 
tell if there is outstanding deferred entries (the AsyncWriter/AsyncSyncer 
threads do write/sync in a best-effort way)

I meant calling 'sync' can guarantee no outstanding deferred entries, but no 
calling 'sync' after writew can't guarantee there must be some outstanding 
deferred entries since they can be sync-ed by asyncWriter/asyncSyncer threads. 
This is not the same behavior as under old write model.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-23 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830753#comment-13830753
 ] 

Ted Yu commented on HBASE-8755:
---

bq. 2/3 have worse perf, 10 has equal perf but introduces too many extra threads
Do you remember how many extra threads were introduced ?

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830755#comment-13830755
 ] 

stack commented on HBASE-8755:
--

bq. Do you remember how many extra threads were introduced ?

This question is answered above in my last review.  Why do you ask?

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830356#comment-13830356
 ] 

stack commented on HBASE-8755:
--

I ran Feng's script except I left out trying 200 threads.

||Threads||time-wo-patch||ops-wo-patch||time-w-patch||ops-w-patch||ops-diff||
|1|973.673|1027.039|1119.825|892.997|-15%|
|3|1303.891|2300.806|1400.848|2141.560|-7%|
|5|855.775|5842.657|873.990|5720.889|-2%|
|10|1093.330|9146.370|1090.158|9172.982|0%|
|25|1632.263|15316.160|1215.196|20572.813|+25%|
|50|2432.653|20553.691|1341.847|37262.070|+45%|
|100|4058.650|24638.734|1725.729|57946.527|57%|

This was stock hadoop 2.2 and tip of the hbase trunk.

Those are pretty big improvements once we pass out ten concurrent threads.  
Some small down when one writer only is acceptable I'd say.  Let me look again 
at the patch.


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830360#comment-13830360
 ] 

stack commented on HBASE-8755:
--

We looked at using disruptor here?  MPSC seems to be what we have going on here 
to which disruptor seems pretty well suited?  Looking at the patch now...

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-22 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830516#comment-13830516
 ] 

Feng Honghua commented on HBASE-8755:
-

Thanks [~stack]

A small clarification of why we got so different ops-diff: I used *ops-diff = 
(new - old) / old*  and you used *ops-diff = (new - old) / new*. for example, 
the 100 threads result, old one is 24638 and new one is 57946, the ops-diff is 
*135.2%* (new one is 2.35 times of old one), while you got *57%* :-)

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830563#comment-13830563
 ] 

stack commented on HBASE-8755:
--

Pardon my bad math [~fenghh]  Below is correction (numbers are better now).

||Threads||time-wo-patch||ops-wo-patch||time-w-patch||ops-w-patch||ops-diff||
|1|973.673|1027.039|1119.825|892.997|-15%|
|3|1303.891|2300.806|1400.848|2141.560|-7%|
|5|855.775|5842.657|873.990|5720.889|-2%|
|10|1093.330|9146.370|1090.158|9172.982|0%|
|25|1632.263|15316.160|1215.196|20572.813|+34%|
|50|2432.653|20553.691|1341.847|37262.070|+81%|
|100|4058.650|24638.734|1725.729|57946.527|+135%|

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830604#comment-13830604
 ] 

stack commented on HBASE-8755:
--

Looking at patch...

Do these no longer pass?

-assertTrue(Should have an outstanding WAL edit, ((FSHLog) 
log).hasDeferredEntries());
+//assertTrue(Should have an outstanding WAL edit, ((FSHLog) 
log).hasDeferredEntries());

We have hard-coded 5 asyncSyncers?  Why 5?

We are starting 7 threads to run this new write model : the 5 syncers, a 
notifier, and a writer (I suppose we are also removing one, the old LogSyncer 
-- so 6 new threads).

We set the below outside of a sync block:

+  this.asyncWriter.setPendingTxid(txid);

Could we set the txid out of order here?   If so, will that be a problem?  
Hmm... it seems not.  If we get a lower txid, we just return and prevail w/ the 
highest we've seen.

If we fail to find a free syncer, i don't follow what is going on w/ choosing a 
random syncer and setting txid as in below (Generally comments in patch are 
good -- this looks like a section that could do w/ some):

+  this.lastWrittenTxid = this.txidToWrite;
+  boolean hasIdleSyncer = false;
+  for (int i = 0; i  asyncSyncers.length; ++i) {
+if (!asyncSyncers[i].isSyncing()) {
+  hasIdleSyncer = true;
+  asyncSyncers[i].setWrittenTxid(this.lastWrittenTxid);
+  break;
+}
+  }
+  if (!hasIdleSyncer) {
+int idx = rd.nextInt(asyncSyncers.length);
+asyncSyncers[idx].setWrittenTxid(this.lastWrittenTxid);
+  }

Thanks.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-21 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829631#comment-13829631
 ] 

Feng Honghua commented on HBASE-8755:
-

[~stack] : I fixed the downgrade for 5 threads and below are the test result, 
the write throughput capacity after tuning is *about 3.5 times* of the one 
before tuning while keeping the perf is almost equal when thread number is = 
5. (for protobuf/test compatibility reason I based on hbase trunk 0.97 r1516083)

{code}
  for i in 1 3 5 10 25 50 100 200; do
for j in 1; do
./bin/hbase 
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -path 
/user/h_fenghonghua/new-thread-v2/ -verify -threads ${i} -iterations 100 
-keySize 50 -valueSize 100  log-patch${i}.${j}.txt;
grep Summary:  log-patch${i}.${j}.txt
done;
done
{code}

||threads||time-without-patch||ops-without-patch||time-with-patch||ops-with-patch||ops-diff||
|1|582|1716|630|1586|-7.5%|
|3|943|3179|951|3153|-0.8%|
|5|820|6091|847|5899|-3.1%|
|10|1141|8760|983|10166|+16%|
|25|1920|13019|1286|19426|+49.2%|
|50|3334|14995|1627|30715|+104.8%|
|100|5312|18824|1925|51943|+185%|
|200|11022|18144|3229|61922|+241.2%|

[~jmspaggi] : I attached patches for latest trunk (HBASE-8755-trunk-v4.patch) 
and for the last 0.96 (HBASE-8755-0.96-v0.patch). Thanks very much if you can 
run similar comparison perf tests, and any issue please feel free to raise :-)

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829656#comment-13829656
 ] 

stack commented on HBASE-8755:
--

I'm starting tests running now.



 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-12 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13820068#comment-13820068
 ] 

Jean-Marc Spaggiari commented on HBASE-8755:


[~fenghh], any chance to rebase on the last 0.96 and optimize for less than 5 
clients? If so I can give it a spin on a small cluster and run the PerfEval 
suite for 24 h to see the results? 

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-12 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13820794#comment-13820794
 ] 

Feng Honghua commented on HBASE-8755:
-

[~jmspaggi] sure, I'm tuning for less than 5 threads, and will provide patch 
based on the last 0.96 when the bottleneck is found and the tuning is done.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-10-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808163#comment-13808163
 ] 

Sergey Shelukhin commented on HBASE-8755:
-

nice. Thanks!

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-10-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807214#comment-13807214
 ] 

Sergey Shelukhin commented on HBASE-8755:
-

Question from above: 
{quote}
...is separate notifier thread necessary? It doesn't do any blocking operations 
other than interacting with flusher thread, or taking syncedTillHere lock, 
which looks like it should be uncontested most of the time.
Couldn't flusher thread have the 4~ lines that set syncedTillHere?
{quote}

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-10-28 Thread Hangjun Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13807618#comment-13807618
 ] 

Hangjun Ye commented on HBASE-8755:
---

[~sershe], that's a very good question.
We found syncedTillHere.notifyAll() was a blocking operation and it might take 
a long time if many threads were waiting on it (which also surprised us). So we 
put this logic in a separate thread.


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-10-24 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13803932#comment-13803932
 ] 

stack commented on HBASE-8755:
--

Five datanodes with fusionio.  hbase tip of 0.96 branch and hadoop-2.1.0-beta.  
HLogPE on master node.

||Threads||w/o patch time||w/o patch ops||w/ patch time||w/ patch ops||
|1|1048.033s|954.168ops/s|1100.423s|908.741ops/s|
|1|1042.126s|959.577ops/s|1156.557s|864.635ops/s|
|1|1052.601s|950.028ops/s|1143.271s|874.683ops/s|
|5|904.176s|5529.896ops/s|1916.229s|2609.292ops/s|
|5|910.469s|5491.675ops/s|1911.841s|2615.280ops/s|
|5|925.778s|5400.863ops/s|1970.565s|2537.344ops/s|
|50|2699.752s|18520.221ops/s|1889.877s|26456.748ops/s|
|50|2689.678s|18589.586ops/s|1922.716s|26004.881ops/s|
|50|2711.144s|18442.398ops/s|1893.439s|26406.977ops/s|
|75|4945.563s|15165.108ops/s|1997.553s|37545.938ops/s|
|75|4852.779s|15455.063ops/s|1992.425s|37642.570ops/s|
|75|4921.685s|15238.684ops/s|-|-|
|100|6224.527s|16065.479ops/s|2086.691s|47922.766ops/s|
|100|6195.727s|16140.156ops/s|2091.869s|47804.145ops/s|

Diffs are small when 1 thread only.  Its bad at 5 threads but thereafter the 
patch starts to shine.  If we could make the 5 threads better, we could commit 
this patch.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-10-24 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804957#comment-13804957
 ] 

Feng Honghua commented on HBASE-8755:
-

[~stack] thanks a lot for the detailed perf test. agree with you. we'll try to 
make the 5 threads better

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-10-11 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792389#comment-13792389
 ] 

Feng Honghua commented on HBASE-8755:
-

[~stack] thanks a lot :-)

btw: the HLogPE test result withoutPatch/withPatch against 0.94.3(our internal 
branch) done by [~liushaohui] matches the one against hdfs sequence file, 
described in the 'Description' of this JIRA as below (17K vs. 68K):
bq. Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
proposed a new write thread model for writing hdfs sequence file and the 
prototype implementation shows a 4X improvement for throughput (from 17000 to 
7+).

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-10-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792422#comment-13792422
 ] 

stack commented on HBASE-8755:
--

Does that mean that we can just use HLogPE going forward evaluating this patch 
and not have to set up a cluster?  Because it gives same results as a cluster 
does?  Thanks [~fenghh]

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-10-11 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792430#comment-13792430
 ] 

Feng Honghua commented on HBASE-8755:
-

[~stack]: Agree. Seems HLogPE is enough for profiling/evaluating, let us run 
YCSB on a cluster for later double verification? :-)

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-10-11 Thread Hangjun Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792433#comment-13792433
 ] 

Hangjun Ye commented on HBASE-8755:
---

To clarify, a hdfs cluster is still needed to store the sequence file.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-10-10 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792279#comment-13792279
 ] 

Liu Shaohui commented on HBASE-8755:


[~stack] [~fenghh][~yehangjun]
We redo the same HogPE comparision test using hbase trunk. Results are followed.

Test env:
hdfs: cdh 4.1.0, five datanode, each node has 12 sata disks.
hbase: hbase trunk 0.97 r1516083 (for in r1516084, trunk update the protobuf to 
2.5 which is incompatible with protobuf 2.4 used in cdh 4.1.0)
HLogPE is run on one of these datanodes, so one replica of hlog's block will be 
at local datanode.

||Thread number ||Time without Patch(s) || Ops without Patch || Time with 
Patch(s) ||Ops with Patch || Time diff %|| Ops diff % ||
|1|580.309|1723.22|624.709|1600.745|-7.65|-7.11|
|1|591.177|1691.541|631.34|1583.932|-6.79|-6.36|
|1|591.948|1689.338|634.518|1575.999|-7.19|-6.71|
|5|794.034|6296.959|1201.563|4161.247|-51.32|-33.92|
|5|781.033|6401.778|1191.776|4195.419|-52.59|-34.46|
|5|805.597|6206.577|1187.179|4211.665|-47.37|-32.14|
|50|3222.659|15515.139|1815.586|27539.316|43.66|77.50|
|50|3191.131|15668.426|1821.956|27443.033|42.91|75.15|
|50|3222.407|15516.352|1817.754|27506.473|43.59|77.27|
|75|4517.149|16603.393|2024.359|37048.766|55.19|123.14|
|75|4498.987|16670.42|2016.899|37185.797|55.17|123.06|
|75|4554.122|16468.598|2037.155|36816.051|55.27|123.55|
|100|5186.292|19281.598|2147.581|46564.016|58.59|141.49|
|100|5181.344|19300.012|2135.768|46821.563|58.78|142.60|
|100|5189.396|19270.064|2143.529|46652.039|58.69|142.10|

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-10-10 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792293#comment-13792293
 ] 

stack commented on HBASE-8755:
--

[~fenghh] Our results are the same?

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-10-10 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792301#comment-13792301
 ] 

Feng Honghua commented on HBASE-8755:
-

[~stack]: Yes, seems this patch against trunk has obvious downgrade compared to 
against 0.94.3(our internal branch) for:
  1) 5 threads: withPatch has worse perf than withoutPatch (33% ops downgrade, 
4.2K vs. 6.3K)
  2) 100 threads: withpatch's perf is about 2.5X of withoutPatch (46.6K vs. 
19.2K)

A short summary:
  1) withoutPatch, the max ops of HLog is less than 20K (19K for trunk and 17K 
for 0.94.3)
  2) withPatch, the max ops of HLog is more than 45K (46K for trunk and 68K for 
0.94.3)
  3) for trunk, withPatch can have even worse perf than withoutPatch (about 33% 
downgrade)

We'll try to figure out why withPatch performs worse than withoutPatch for 
trunk, and try to ensure the performance is about equal when stress is low and 
still keep obvious upgrade when stress is high. :-)

[~stack] : would you please redo the test using 75/100 threads to re-confirm 
whether the ops upgrade matches our tests? (we see 37K vs 16K for 75 threads 
and 46K vs 19K for 100 threads)

[~zjushch] : what version of HBase do you apply the patch to? trunk or 0.94? I 
wonder if the reason is the same for our difference and stack's

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-10-10 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792313#comment-13792313
 ] 

stack commented on HBASE-8755:
--

[~fenghh] will do.  Sorry.  Meant to do it earlier.  Will try to do some 
profiling too myself to see why worse wen low thread count.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-09-26 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779581#comment-13779581
 ] 

Liu Shaohui commented on HBASE-8755:


[~stack] [~fenghh]
We redo the comparision test using HogPE. Here are the results:

Test env:

hdfs: cdh 4.1.0, five datanode, each node has 12 sata disks.
hbase: 0.94.3
HLogPE is run on one of these datanodes, so one replica of hlog's block will be 
at local datanode.

The params of HLogPE are: -iterations 100 -keySize 50 -valueSize 100, which 
are same as stack's tests.

{code} 
for i in 1 5 50 75 100; do 
for j in 1 2 3; do
./bin/hbase 
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -verify 
-threads ${i} -iterations 100 -keySize 50 -valueSize 100  
log-patch${i}.${j}.txt; 
grep Summary:  log-patch${i}.${j}.txt
done; 
done
{code} 

||Thread||Count||WithoutPatch||WithPatch||
|1|579.380|625.937|-8.03|
|1|580.307|630.346|-8.62|
|1|577.853|654.20|-13.21|
|5|799.579|785.696|1.73|
|5|795.013|780.642|1.80|
|5|826.270|781.909|5.36|
|50|3290.482|1165.773|64.57|
|50|3298.387|1167.992|64.58|
|50|3224.495|1154.921|64.18|
|75|4450.760|1253.448|71.83|
|75|4506.143|1269.806|71.82|
|75|4516.453|1245.954|72.41|
|100|5561.074|1493.102|73.15|
|100|5616.810|1496.263|73.36|
|100|5612.268|1468.500|73.83|


a, When thread number is 1, we see that the performance of our test is about 
40% better than that of stack's test, both in old thread mode and new thread 
mode. [~stack] what's the hdfs version in your test or are there special 
configs? 
b, When thread number is 5, we do not see -50% downgrade.
 

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-09-26 Thread Liu Shaohui (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13779607#comment-13779607
 ] 

Liu Shaohui commented on HBASE-8755:


update the result table and add ops diff 
||Thread number|| Time without Patch || Ops without Patch||Time with Patch || 
Ops with Patch || Time diff % || Ops diff % ||
|1|579.38|1725.983|625.937|1597.605|-8.04|-7.44|
|1|580.307|1723.226|630.346|1586.43|-8.62|-7.94|
|1|577.853|1730.544|654.205|1528.573|-13.21|-11.67|
|5|799.579|6253.291|785.696|6363.785|1.74|1.77|
|5|795.013|6289.206|780.642|6404.984|1.81|1.84|
|5|826.27|6051.291|781.909|6394.606|5.37|5.67|
|50|3290.482|15195.343|1165.773|42890|64.57|182.26|
|50|3298.387|15158.925|1167.992|42808.516|64.59|182.40|
|50|3224.495|15506.304|1154.921|43293.004|64.18|179.20|
|75|4450.76|16851.055|1253.448|59834.953|71.84|255.08|
|75|4506.143|16643.945|1269.806|59064.141|71.82|254.87|
|75|4516.453|16605.951|1245.954|60194.84|72.41|262.49|
|100|5561.074|17982.137|1493.102|66974.656|73.15|272.45|
|100|5616.81|17803.699|1496.263|66833.172|73.36|275.39|
|100|5612.268|17818.107|1468.5|68096.695|73.83|282.18|

Time diff = (Time without Patch - Time with Patch) / Time without Patch * 100
Ops diff = (Ops with Patch - Ops without Patch) / Ops without Patch * 100

[~stack] What are the hdfs and hbase version of your test? We may rebo the 
tests in cluster with same hdfs and hbase versions as yours.


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-09-23 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774603#comment-13774603
 ] 

Feng Honghua commented on HBASE-8755:
-

bq.If I weren't in Germany right with no VPN access I'd offer to load 0.94 with 
the patch onto one of our test clusters and run some workload on it...

[~lhofhansl] Wonder if you now have access to your test clusters, if possible 
would you please also help do the performance comparison test on your test 
clusters? (I'm confused why no one else get comparable improvement test result 
as ours, while we get test results with almost same improvement for 
multi-tests) 

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-09-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775651#comment-13775651
 ] 

stack commented on HBASE-8755:
--

bq. What was implication of not shutting this?   Were tests failing or is this 
just make-work?

I did not save avg ops/s. It was a set amount of work.

bq. 2. is the write load against a single node, or five nodes? – to confirm the 
throughput is per-node or per-cluster(with five nodes)

I had a client writing to a WAL hosted in hdfs on a 5-node HDFS cluster.

Thanks [~fenghh]

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-09-19 Thread Hangjun Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771714#comment-13771714
 ] 

Hangjun Ye commented on HBASE-8755:
---

Thanks [~stack] for doing this experiment! The result looks positive in some 
way.
China is in holiday (Mid-Autumn Festival) until this Saturday, we would check 
your suggestion after the holiday.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-09-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772189#comment-13772189
 ] 

stack commented on HBASE-8755:
--

[~fenghh] Happy holidays!  I didn't profile to see where time is spent.  That 
would be next I'd say.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-09-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770888#comment-13770888
 ] 

stack commented on HBASE-8755:
--

Just parking the numbers here.   Takes same time to write 5x as much (5M 
entries by 5 threads) as 1M entries by 1 thread.  50x takes 6x longer (50 
threads writing 50M entries) 

{code}
/tmp/log-patch1.1.txt:2013-09-17 21:19:31,495 INFO  [main] 
wal.HLogPerformanceEvaluation: Summary: threads=1, iterations=100 took 
991.258s 1008.819ops/s
/tmp/log-patch1.2.txt:2013-09-17 21:35:04,715 INFO  [main] 
wal.HLogPerformanceEvaluation: Summary: threads=1, iterations=100 took 
924.881s 1081.220ops/s
/tmp/log-patch1.3.txt:2013-09-17 21:51:32,416 INFO  [main] 
wal.HLogPerformanceEvaluation: Summary: threads=1, iterations=100 took 
979.312s 1021.125ops/s
/tmp/log-patch50.1.txt:2013-09-17 23:29:45,188 INFO  [main] 
wal.HLogPerformanceEvaluation: Summary: threads=50, iterations=100 took 
2960.095s 16891.350ops/s
/tmp/log-patch50.2.txt:2013-09-18 00:22:48,849 INFO  [main] 
wal.HLogPerformanceEvaluation: Summary: threads=50, iterations=100 took 
2924.844s 17094.930ops/s
/tmp/log-patch50.3.txt:2013-09-18 01:15:58,646 INFO  [main] 
wal.HLogPerformanceEvaluation: Summary: threads=50, iterations=100 took 
2927.617s 17078.736ops/s
/tmp/log-patch5.1.txt:2013-09-17 22:07:31,712 INFO  [main] 
wal.HLogPerformanceEvaluation: Summary: threads=5, iterations=100 took 
950.968s 5257.800ops/s
/tmp/log-patch5.2.txt:2013-09-17 22:23:39,680 INFO  [main] 
wal.HLogPerformanceEvaluation: Summary: threads=5, iterations=100 took 
939.312s 5323.045ops/s
/tmp/log-patch5.3.txt:2013-09-17 22:39:56,011 INFO  [main] 
wal.HLogPerformanceEvaluation: Summary: threads=5, iterations=100 took 
947.183s 5278.811ops/s
{code}

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-09-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13771587#comment-13771587
 ] 

stack commented on HBASE-8755:
--

I was wrong that HLogPE called doWrite.  It calls append as an HRegionServer 
would.

Here are the numbers.  They confirm what [~zjushch] found way back up top of 
this issue.

I was running HLogPE against a five node HDFS cluster.  The DataNodes were 
persisting to a fusionio drive so little friction at the drive.

Below are seconds elapsed running following:

{code}
$ for i in 1 5 50; do  for j in 1 2 3; do ./bin/hbase --config 
/home/stack/conf_hbase 
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation  -verify 
-threads ${i} -iterations 100 -nocleanup  -keySize 50 -valueSize 100  
/tmp/log-patch${i}.${j}.txt; done; done
{code}

||Thread Count||WithoutPatch||WithPatch||%diff||
|1|991.258|1125.208|-11.90|
|1|924.881|1137.754|-18.70|
|1|979.312|1142.959|-14.31|
|5|950.968|1914.448|-50.32|
|5|939.312|1918.188|-51.03|
|5|947.183|1939.806|-51.17|
|50|2960.095|1918.808|54.26|
|50|2924.844|1933.020|51.30|
|50|2927.617|1955.358|49.72|

So, about 20% slower when single threaded writes.  About 50% slower when five 
threads writing concurrently  BUT 50% faster when 50 concurrent threads (our 
current default).

Can we have the best of both worlds somehow where we switch to this new model 
when high contention?


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-09-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13770455#comment-13770455
 ] 

stack commented on HBASE-8755:
--

Trying this on hdfs.  It takes WAY longer.  Threads stuck here:

{code}
t1 prio=10 tid=0x7f49fd4fc800 nid=0xfc4 in Object.wait() 
[0x7f49d0d99000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at 
org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:1795)
- locked 0x000423dd1cc8 (a java.util.LinkedList)
at 
org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1689)
at 
org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1582)
at 
org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1567)
at 
org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
at 
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:135)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1072)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:1195)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.append(FSHLog.java:910)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.append(FSHLog.java:844)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.append(FSHLog.java:838)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation$HLogPutBenchmark.run(HLogPerformanceEvaluation.java:110)
at java.lang.Thread.run(Thread.java:662)
{code}

Here are numbers I have so far for WITHOUT patch:


/tmp/log-patch1.1.txt:2013-09-17 21:19:31,495 INFO  [main] 
wal.HLogPerformanceEvaluation: Summary: threads=1, iterations=100 took 
991.258s 1008.819ops/s
/tmp/log-patch1.2.txt:2013-09-17 21:35:04,715 INFO  [main] 
wal.HLogPerformanceEvaluation: Summary: threads=1, iterations=100 took 
924.881s 1081.220ops/s
/tmp/log-patch1.3.txt:2013-09-17 21:51:32,416 INFO  [main] 
wal.HLogPerformanceEvaluation: Summary: threads=1, iterations=100 took 
979.312s 1021.125ops/s
/tmp/log-patch5.1.txt:2013-09-17 22:07:31,712 INFO  [main] 
wal.HLogPerformanceEvaluation: Summary: threads=5, iterations=100 took 
950.968s 5257.800ops/s
/tmp/log-patch5.2.txt:2013-09-17 22:23:39,680 INFO  [main] 
wal.HLogPerformanceEvaluation: Summary: threads=5, iterations=100 took 
939.312s 5323.045ops/s

Will clean up later but write rate is constant whether 1 or 5 threads.  Will 
see when 50.  Looks like I need to mode the HLogPE.  It calls FSHLog#doWrite 
directory which is not as interesting since is by-passes locks in FSHLog when 
it does not call append.

Will be back.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its 

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-09-08 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761563#comment-13761563
 ] 

Feng Honghua commented on HBASE-8755:
-

Thanks [~stack]. Looking forward to your test result on hdfs

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-09-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761065#comment-13761065
 ] 

stack commented on HBASE-8755:
--

Staring in on taking a look at this.  I tried HLogPerformanceEvaluation.  Going 
via ycsb would seem to add noise and indirection.  I ran with 1, 5, and 50 
threads w/ sizes that are like Honghua's (key of 50 and value of 150).

I ran the test like this on trunk using localfs:

{code}
$ for i in 1 5 50; do  for j in 1 2 3; do ./bin/hbase 
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation  -verify 
-threads ${i} -iterations 100 -nocleanup -verbose -keySize 50 -valueSize 
100  /tmp/log-patch${i}.${j}.txt; done; done
{code}

Needs fixes over in HBASE-9460 for above to work on localfs.

With localfs, the patch is twice as slow.  localfs does not support sync so 
this is probably what makes the difference -- the extra threads do better w/ 
the dfsclient's stutter around sync.  Let me try up on hdfs next.

Below are numbers.  I ran each test three times: i.e. three times without patch 
with one thread, then three times with patch and one thread, etc.

Each thread did 1M puts.  The table is times for the test to complete in 
seconds so less is better.

||Thread Count||WithoutPatch||WithPatch|| 
|1|4.895|29.088|
|1|4.856|29.544|
|1|4.901|29.326|
|5|24.173|53.974|
|5|24.013|55.976|
|5|23.390|55.858|
|50|253.773|454.147|
|50|247.095|443.215|
|50|254.044|449.043|


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Fix For: 0.96.1

 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-07-28 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721960#comment-13721960
 ] 

Jean-Marc Spaggiari commented on HBASE-8755:


Ok. I did some other tries and here are the results.

jmspaggi@hbasetest:~/hbase/hbase-$ cat output-1.1.2.txt
421428.8
jmspaggi@hbasetest:~/hbase/hbase-$ cat output-1.1.2-8755.txt
427172.1
jmspaggi@hbasetest:~/hbase/hbase-$ cat output-1.2.0.txt
419673.3
jmspaggi@hbasetest:~/hbase/hbase-$ cat output-1.2.0-8755.txt
432413.9

This is elapse time. Between each iteration I totally delete (rm -rf) the 
hadoop directories, stop all the java processes, etc. Test is 10M randomWrite.

So unfortunately I have not been able to see any real improvement. For YCSB, 
any specific load I should run to be able to see something better that without 
8755? I guess it's a write intensive load that we want? Also, I have tested 
this on a pseudo-distributed instance (no more a standalone one), but I can 
dedicate 4 nodes to a test if required...

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Feng Honghua
 Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-07-28 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721973#comment-13721973
 ] 

Feng Honghua commented on HBASE-8755:
-

[~jmspaggi], thanks for your test, some questions about your test: Is it 
against real HDFS? how many data-nodes and RS? what's the write pressure(client 
number, write thread number)? what's the total throughput you get?

Yes this jira aims for throughput improvement under write intensive load. It 
should be tested and verified under write intensive load against real cluster / 
HDFS environment. And as you can see this jira only refactors the write thread 
model rather than tuning any write sub-phase along the whole write path for any 
individual write request, no obvious improvement is expected for low/ordinary 
write pressure.

If you have a real cluster environment with 4 data-nodes, it would be better to 
re-do the test chunhui/I did with the similar test configuration/load which are 
listed in detail in above comments. 1 client with 200 write threads is OK for 
pressing a single RS and 4 clients each with 200 write threads for pressing 4 
RS.

Thanks again.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Feng Honghua
 Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-07-28 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722098#comment-13722098
 ] 

Jean-Marc Spaggiari commented on HBASE-8755:


Hi [~fenghh], I ran in pseudo-distributed for all the tests, which mean HBase 
again a real HDFS. Tests with 1.1.2 and 1.2.0. The test was a 10 million 
randomWrite test. Ran 10 of them.

I will re-run some tests again a single node (still in the process to install 
the OS on the other one) and will but more load against it. Like you said, 200 
threads doing 100 000 writes eaches... More to come...

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Feng Honghua
 Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-07-24 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718252#comment-13718252
 ] 

Jean-Marc Spaggiari commented on HBASE-8755:


Hi [~fenghh],

I was in a process to give it a try, is there a new version for 0.94 coming? Or 
the one attached in the JIRA is already the good one?

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Feng Honghua
 Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-07-24 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718270#comment-13718270
 ] 

Feng Honghua commented on HBASE-8755:
-

[~jmspaggi] HBASE-8755-0.94-V1.patch is good. Let me know if any problem. 
Thanks Jean-Marc

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Feng Honghua
 Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-07-17 Thread Tianying Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13711711#comment-13711711
 ] 

Tianying Chang commented on HBASE-8755:
---

[~zjushch] In your test, you use 300 RPC handler. Since default is 10, and we 
use 50 in our production cluster, when do we really need that many RPC  
handler? Will that be more of overhead than benefit? Do you see the write 
throughput up when increase the # of handler? Any particular scenario where 
high # of handler help?

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Feng Honghua
 Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-trunk-V0.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-07-15 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708433#comment-13708433
 ] 

Jean-Marc Spaggiari commented on HBASE-8755:


Hi [~fenghh], the serves dedicated for those tests were to much different so I 
think it will have not been a good idea to run on them (difficult to interpret 
the results). So I just bought 3 absolutely identical nodes (the same as one I 
already have). By the end of the week I will have 4 servers with the same MB, 
same CPU, same memory (branch, PN, etc.) and same hard drive! Master will have 
a 1xSSD+1xSATA, others will have 2xSATA. To start.

I will run the YCSB on that with and without this patch as soon as I get the 
hardware and I install the OS. Should be sometime later this week.

I will also see if I can run PE.

More to come.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Feng Honghua
 Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-trunk-V0.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-07-15 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708473#comment-13708473
 ] 

Feng Honghua commented on HBASE-8755:
-

Thanks a lot [~jmspaggi], looking forward to your result.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Feng Honghua
 Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-trunk-V0.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-07-14 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13708189#comment-13708189
 ] 

Feng Honghua commented on HBASE-8755:
-

[~jmspaggi], what's your result of running YCSB against real cluster 
environment?

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Feng Honghua
 Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-trunk-V0.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-06-22 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13691138#comment-13691138
 ] 

Jean-Marc Spaggiari commented on HBASE-8755:


It was PerformanceEvaluation and not YCSB. And I ran it with only one thread. 
It's now running with 10 threads. Will have the results tomorrow I think, or 
maybe by end of day if it's going fast enought. I will run YCSB after that, but 
not sure it's relevant in standalone mode. I will still try. And very soon add 
another dedicated test node...

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Feng Honghua
 Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-trunk-V0.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-06-22 Thread Hangjun Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13691178#comment-13691178
 ] 

Hangjun Ye commented on HBASE-8755:
---

It possibly doesn't have significant difference under standalone mode.
The point of this patch is to avoid heavy race conditions of multiple handler 
threads, we found many race conditions happened in HDFS client (when many 
threads called write/hflush concurrently)

We tested on a single RS with a 5 data-node underlying HDFS cluster using YCSB, 
would you like to have a try on a similar environment?

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Feng Honghua
 Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-trunk-V0.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-06-21 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690238#comment-13690238
 ] 

Lars Hofhansl commented on HBASE-8755:
--

I ran some local tests, and even with a single threaded client I did not see a 
performance degradation with this patch (0.94).
I inserted 10m small rows. The cluster consists of a single RS/DN, HDFS is 
backed by a SSD. So the HDFS cost of writes should be low, and that in turns 
should bring out any new inefficiencies introduced by this patch.

So from my angle this patch is good.


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Feng Honghua
 Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-trunk-V0.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-06-21 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690259#comment-13690259
 ] 

Jean-Marc Spaggiari commented on HBASE-8755:


Here are my results:
||Test||Trunk||8755||
|org.apache.hadoop.hbase.PerformanceEvaluation$FilteredScanTest|437101.8|431561.7|
|org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest|774306.8|768559|
|org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test|21999.7|22277.6|
|org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest|134429.3|134370.9|
|org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest|112814.7|125324.2|
|org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest|78574.42|80064.5|

It's time. so the smaller, the better. So overall, it's pretty the same 
results, except for the random write which are 11% slower.
|org.apache.hadoop.hbase.PerformanceEvaluation$FilteredScanTest|1.27%|
|org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest|0.74%|
|org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test|-1.26%|
|org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest|0.04%|
|org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest|-11.09%|
|org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest|-1.90%|

I will redo the RandomWrite tests and try them with 1, 10 and 100 thread to see 
if only mono-threaded access is impacted...


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Feng Honghua
 Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-trunk-V0.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-06-21 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690288#comment-13690288
 ] 

Lars Hofhansl commented on HBASE-8755:
--

Interesting. For writing the WAL it should not make any difference whether we 
write sequentially or at random.
A possible reason is that the batching there is less efficient, maybe that 
brings out the extra thread hand off needed here.
(my test was the equivalent of a sequential write, but using a handcoded test)


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Feng Honghua
 Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-trunk-V0.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-06-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13690375#comment-13690375
 ] 

stack commented on HBASE-8755:
--

[~jmspaggi] Thanks boss.  What test is this?  YCSB w/ how many clients?  Thank 
you sir.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Feng Honghua
 Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-trunk-V0.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-06-20 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13688936#comment-13688936
 ] 

Feng Honghua commented on HBASE-8755:
-

[~zjushch] We run the same tests as yours, and below are the result:

  1). One YCSB client with 5/50/200 write threads respectively
  2). One RS with 300 RPC handlers, 20 regions  (5 data-nodes back-end HDFS 
running CDH 4.1.1)
  3). row-size = 150 bytes

threads  row-count new-throughputnew-latency old-throughput  
old-latency
---
5 20   3191  1.551(ms)   3172 
1.561(ms)
50   200   23215 2.131(ms)   7437 
6.693(ms)
200  200   35793 5.450(ms)   10816   
18.312(ms)
---

A). the difference is negligible when 5 threads of YCSB client
B). new-model still has 3X+ improvement compared to old-model when threads are 
50/200

Anybody else can help do the similar tests using the same test configuration as 
Chunhui?


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: wal
Reporter: Feng Honghua
 Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-trunk-V0.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >