subject:"\[jira\] \[Commented\] \(HBASE\-8755\) A new write thread model for HLog to improve the overall HBase write throughput"


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847958#comment-13847958
 ] 

Hudson commented on HBASE-8755:
---

SUCCESS: Integrated in HBase-TRUNK #4722 (See 
[https://builds.apache.org/job/HBase-TRUNK/4722/])
HBASE-8755 A new write thread model for HLog to improve the overall HBase write 
throughput (stack: rev 1550778)
* /hbase/trunk/hbase-common/src/main/resources/hbase-default.xml
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestDurability.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848055#comment-13848055
 ] 

Hudson commented on HBASE-8755:
---

SUCCESS: Integrated in HBase-0.98 #11 (See 
[https://builds.apache.org/job/HBase-0.98/11/])
HBASE-8755 A new write thread model for HLog to improve the overall HBase write 
throughput (stack: rev 1550782)
* /hbase/branches/0.98/hbase-common/src/main/resources/hbase-default.xml
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestDurability.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848092#comment-13848092
 ] 

Hudson commented on HBASE-8755:
---

SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #8 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/8/])
HBASE-8755 A new write thread model for HLog to improve the overall HBase write 
throughput (stack: rev 1550782)
* /hbase/branches/0.98/hbase-common/src/main/resources/hbase-default.xml
* 
/hbase/branches/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestDurability.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848207#comment-13848207
 ] 

Hudson commented on HBASE-8755:
---

SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #5 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/5/])
HBASE-8755 A new write thread model for HLog to improve the overall HBase write 
throughput (stack: rev 1550778)
* /hbase/trunk/hbase-common/src/main/resources/hbase-default.xml
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestDurability.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollAbort.java


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: Feng Honghua
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846412#comment-13846412
 ] 

stack commented on HBASE-8755:
--

All the Async* are just hanging out waiting.  Missed notification or no 
notification?

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846583#comment-13846583
 ] 

stack commented on HBASE-8755:
--

I notice a syncer exited (it is missing from the thread dump).

2013-12-11 22:40:28,887 INFO  
[regionserver60020-AsyncHLogSyncer3-1386830380159] wal.FSHLog: 
regionserver60020-AsyncHLogSyncer3-1386830380159 exiting

On a new run, again a thread exits:

2013-12-12 10:34:34,620 DEBUG [regionserver60020.logRoller] 
regionserver.LogRoller: HLog roll requested
2013-12-12 10:34:35,526 DEBUG [regionserver60020.logRoller] wal.FSHLog: 
cleanupCurrentWriter  waiting for transactions to get synced  total 37240 
synced till here 37210
2013-12-12 10:34:36,560 INFO  
[regionserver60020-AsyncHLogSyncer1-1386873205908] wal.FSHLog: 
regionserver60020-AsyncHLogSyncer1-1386873205908 exiting


Let me try and figure the why.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846657#comment-13846657
 ] 

stack commented on HBASE-8755:
--

Catching all exceptions, I got a NPE.

2013-12-12 11:03:43,870 INFO  [Thread-14] regionserver.DefaultStoreFlusher: 
Flushed, sequenceid=2680455, memsize=129.3 M, hasBloomFilter=true, into tmp 
file 
hdfs://c2020.halxg.cloudera.com:8020/hbase/data/default/usertable/8dcf9c17c090f476346c8a31e4c9eddb/.tmp/64aaf14b38224f4fbce0a999f92dd8f4
2013-12-12 11:03:43,879 INFO  
[regionserver60020-AsyncHLogSyncer2-1386874930310] wal.FSHLog: UNEXPECTED
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.java:1205)
at java.lang.Thread.run(Thread.java:744)

Writer is null.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846984#comment-13846984
 ] 

Hadoop QA commented on HBASE-8755:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12618499/8755v8.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:red}-1 hadoop1.0{color}.  The patch failed to compile against the 
hadoop 1.0 profile.
Here is snippet of errors:
{code}[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) 
on project hbase-server: Compilation failure: Compilation failure:
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java:[436,42]
 unclosed string literal
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java:[436,63]
 ';' expected
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java:[437,17]
 illegal start of expression
[ERROR] 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java:[437,23]
 ';' expected
[ERROR] - [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) 
on project hbase-server: Compilation failure
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:213)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
--
Caused by: org.apache.maven.plugin.CompilationFailureException: Compilation 
failure
at 
org.apache.maven.plugin.AbstractCompilerMojo.execute(AbstractCompilerMojo.java:729)
at org.apache.maven.plugin.CompilerMojo.execute(CompilerMojo.java:128)
at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
... 19 more{code}

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8148//console

This message is automatically generated.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846994#comment-13846994
 ] 

stack commented on HBASE-8755:
--

This is what I'll commit.  I've been running it on small cluster this afternoon 
and after fixing hardware, it seems to run fine at about the same speed as what 
we have currently (ycsb read/write loading).

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846997#comment-13846997
 ] 

stack commented on HBASE-8755:
--

32 threads on one node writing a cluster of 4 nodes (~8 threads per server 
which according to our tests to date shows this model running slower than what 
we have).  It does 10% less throughput after ~25minutes.  We need to get the 
other speedups in after this goes in.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847069#comment-13847069
 ] 

Hadoop QA commented on HBASE-8755:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12618509/8755v9.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8149//console

This message is automatically generated.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-12 Thread Feng Honghua (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847194#comment-13847194
 ] 

Feng Honghua commented on HBASE-8755:
-

[~stack] thanks. seems no further blocking issue?

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-12 Thread Feng Honghua (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847195#comment-13847195
 ] 

Feng Honghua commented on HBASE-8755:
-

bq.It does 10% less throughput after ~25minutes. 
== it's normal when flush/compact occurs after ~25 minutes write, we saw such 
level downgrade when doing long time tests for both with/without patch.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 8755v8.txt, 
 8755v9.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
 HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, 
 HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch, 
 HBASE-8755-trunk-v6.patch, HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch, 
 thread.out


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-11 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846090#comment-13846090
 ] 

stack commented on HBASE-8755:
--

Pardon me. This is taking a while; hardware issues and now I trunk seems to 
have issue where it hangs syncing, pre-patch I believe... investigating.

Here is what I see. Lots of threads BLOCKED here:
{code}
RpcServer.handler=0,port=60020 daemon prio=10 tid=0x012f1800 
nid=0x3cb5 waiting for monitor entry [0x7fdb0eb55000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.append(FSHLog.java:1006)
- waiting to lock 0x000456c00390 (a java.lang.Object)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.appendNoSync(FSHLog.java:1054)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2369)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2087)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2037)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2041)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4175)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3424)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3328)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:28460)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
at java.lang.Thread.run(Thread.java:744)
{code}

Then the fella w/ the lock is doing this:

{code}
regionserver60020.logRoller daemon prio=10 tid=0x01159800 nid=0x3ca7 
in Object.wait() [0x7fdb0f964000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1307)
- locked 0x000456bf37a8 (a java.util.concurrent.atomic.AtomicLong)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1299)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:1412)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.cleanupCurrentWriter(FSHLog.java:760)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:566)
- locked 0x000456c00390 (a java.lang.Object)
- locked 0x000456c00330 (a java.lang.Object)
at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:96)
at java.lang.Thread.run(Thread.java:744)
{code}

Server is bound up.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-11 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846107#comment-13846107
 ] 

stack commented on HBASE-8755:
--

Hmm... Happens when this patch is in place.  Stuck here:

{code}
regionserver60020.logRoller daemon prio=10 tid=0x7f6f08822800 nid=0x5b0a 
in Object.wait() [0x7f6eeccef000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:503)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1304)
- locked 0x00045756db98 (a java.util.concurrent.atomic.AtomicLong)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1296)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:1409)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.cleanupCurrentWriter(FSHLog.java:759)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.rollWriter(FSHLog.java:565)
- locked 0x00045756dc70 (a java.lang.Object)
- locked 0x00045756dc10 (a java.lang.Object)
at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:96)
at java.lang.Thread.run(Thread.java:744)
{code}

Which is here:

1294   // sync all known transactions
1295   private void syncer() throws IOException {
1296 syncer(this.unflushedEntries.get()); // sync all pending items
1297   }
1298
1299   // sync all transactions upto the specified txid
1300   private void syncer(long txid) throws IOException {
1301 synchronized (this.syncedTillHere) {
1302   while (this.syncedTillHere.get()  txid) {
1303 try {
1304   this.syncedTillHere.wait();
1305
1306   if (txid = this.failedTxid.get()) {
1307 assert asyncIOE != null :
1308   current txid is among(under) failed txids, but asyncIOE is 
null!;
1309 throw asyncIOE;
1310   }
1311 } catch (InterruptedException e) {
1312   LOG.debug(interrupted while waiting for notification from 
AsyncNotifier);
1313 }
1314   }
1315 }
1316   }

All other threads are trying to do an appendnosync:

{code}
RpcServer.handler=0,port=60020 daemon prio=10 tid=0x7f6f08a26800 
nid=0x5b1b waiting for monitor entry [0x7f6eebee1000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.append(FSHLog.java:1005)
- waiting to lock 0x00045756dc70 (a java.lang.Object)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.appendNoSync(FSHLog.java:1053)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2369)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2087)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2037)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2041)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4175)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3424)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3328)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:28460)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
at 
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
at java.lang.Thread.run(Thread.java:744)
{code}

... but can't make progress blocked on updateLock.

{code}
 995   private long append(HRegionInfo info, TableName tableName, WALEdit 
edits, ListUUID clusterIds,
 996   final long now, HTableDescriptor htd, boolean doSync, boolean 
isInMemstore,
 997   AtomicLong sequenceId, long nonceGroup, long nonce) throws 
IOException {
 998   if (edits.isEmpty()) return this.unflushedEntries.get();
 999   if (this.closed) {
1000 throw new IOException(Cannot append; log is closed);
1001   }
1002   TraceScope traceScope = Trace.startSpan(FSHlog.append);
1003   try {
1004 long txid = 0;
1005 synchronized (this.updateLock) {
1006   // get the sequence number from the passed Long. In normal flow, 
it is coming from the
1007   // region.
1008   long seqNum = sequenceId.incrementAndGet();
...
{code}

The update lock is held when rolling log here:

 562 synchronized (updateLock) {
 563   // Clean up

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-11 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846110#comment-13846110
 ] 

stack commented on HBASE-8755:
--

Will provide more on this tomorrow.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-11 Thread Feng Honghua (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846130#comment-13846130
 ] 

Feng Honghua commented on HBASE-8755:
-

The stack-traces are ok: when log-roll occurs, it holds the updateLock to 
prevent any subsequent writer handler put edits to pendingWrites (that's why 
all writer handler threads pend on updateLock), and then it calls *sync* to 
wait for Async* threads to sync all edits currently in pendingWrites (that's 
why logroller pend on sync())...
Why no progress, we need to see why Async* threads don't finish the sync of 
current pendingWrites, would you please provide the Async* threads' stack 
traces? 
Only AsyncWriter from Async* threads needs the pendingWritesLock to grab the 
edits from pendingWrites, and AsyncNotifier needs syncTillHere to update it and 
notifyAll, these two both are hold-able under current situation: write handler 
threads can't hold pendingWritesLock before holding updateLock(within 
append()), logroller doesn't hold pendingWritesLock at all, and logroller is 
waiting for syncTillHere's change...

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844108#comment-13844108
 ] 

Hadoop QA commented on HBASE-8755:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12617842/HBASE-8755-trunk-v6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8118//console

This message is automatically generated.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-10 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844827#comment-13844827
 ] 

stack commented on HBASE-8755:
--

All sounds good above [~fenghh]  I'm running a few tests here on cluster to see 
it basically works.  Any other comments by anyone else?  Otherwise, was 
planning on committing.  We can work on further speedup in new issues; e.g. see 
if we can do less threads as per [~himan...@cloudera.com]

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-10 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844838#comment-13844838
 ] 

Ted Yu commented on HBASE-8755:
---

+1 from me.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-10 Thread Jonathan Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844839#comment-13844839
 ] 

Jonathan Hsieh commented on HBASE-8755:
---


I did most of a review of v4 last week -- here are a few nits:

nit:  (fix on commit)
{code}
+  //up and holds the lock
+  // NOTE! can't hold 'upateLock' here since rollWriter will pend
+  // on 'sync()' with 'updateLock', but 'sync()' will wait for
+  // AsyncWriter/AsyncSyncer/AsyncNotifier series. without upateLock
+  // can leads to pendWrites more than pendingTxid, but not problem
{code}
spelling: upate - update


This can go in a follow up issue -- and please add a description of the threads 
/ queues / invariants and  how a wal writes happens in the class javadoc. An 
updated version of the 1-6 list in the description would be great.



Good stuff [~fenghh]!

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-10 Thread Himanshu Vashishtha (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844859#comment-13844859
 ] 

Himanshu Vashishtha commented on HBASE-8755:


Thanks for the explanation [~fenghh]; and it pretty much answers all my 
questions. Also, looking more, getting rid of LogSyncer thread eases out the 
locking semantics of rolling.

Yes, I reran the above experiments on a more standard environment (4 DNs with 
HLogPE running on a DN, and log level set to INFO instead of Debug), and got 
mixed results this time. Varied threads from 2 to 100 and didn't get a clear 
winner. Given the current state of this patch and the cleanup it does, I am +1 
for committing this.

Looking forward to it. Thanks.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-10 Thread Feng Honghua (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845035#comment-13845035
 ] 

Feng Honghua commented on HBASE-8755:
-

Thanks [~jmhsieh], I've made and attached a new patch based on your comment.
Thanks everyone again for your valuable comment and feedback.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-10 Thread Feng Honghua (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845038#comment-13845038
 ] 

Feng Honghua commented on HBASE-8755:
-

[~v.himanshu]:

Looking forward to your performance comparison test result (trunk, 
with-this-patch, with-syncer-only) using latest HLogPE with new 
'appendWithoutSync + sync' logic. And I'll also try to do the same test for 
double comparison/confirm.

Performance improvement is the first and foremost goal of this patch, code 
cleanup is a just by-the-way side-effect, so we want to see this patch 
accepted/checked-in because of the performance improvement it brings, but 
because of the code cleanup it does :-)

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845103#comment-13845103
 ] 

Hadoop QA commented on HBASE-8755:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12618171/HBASE-8755-trunk-v7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:red}-1 site{color}.  The patch appears to cause mvn site goal to 
fail.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8132//console

This message is automatically generated.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-10 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845106#comment-13845106
 ] 

stack commented on HBASE-8755:
--

I'm running some tests local just to make sure.  Will report back...

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, 
 HBASE-8755-trunk-v7.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843224#comment-13843224
 ] 

Feng Honghua commented on HBASE-8755:
-

[~stack] : thanks for the comment, below are the comments after corresponding 
change. a new patch based on [~v.himanshu]'s latest v5 patch is attached(thanks 
[~v.himanshu])
bq.Remove these asserts rather than comment them out given they depended on a 
facility this patch removes.
== done
bq.using a Random for choosing an arbitrary thread for a list of 4 is 
heavyweight
== done
bq.Please remove all mentions of AsyncFlush since it no longer exists
== done
bq.Is this comment right? // txid = failedTxid will fail by throwing asyncIOE; 
 Should it be = failedTxid?
== this comment is right: txid larger than failedTxid isn't sync-ed by the one 
that notifies failedTxid. but txid smaller than or equal to failedTxid is (not 
must be, but since we don't maintain a txid range to syncer mapping, so we fail 
all txid smaller than or equal to failedTxid, this aligns with HBase's write 
semantic of 'failed write may succeed in fact'. this is a point we can refine 
later on by adding txid range to sync operation mapping to precisely indicate 
failure)
bq.This should be volatile since it is set by AsyncSync and then used by the 
main FSHLog thread (you have an assert to check it not null – maybe you ran 
into an issue here already?):   + private IOException asyncIOE = null;
== done
bq.'bufferLock' if a very generic name. Could it be more descriptive? It is a 
lock held for a short while while AsyncWriter moves queued edits off the 
globally seen queue to a local queue just before we send the edits to the WAL. 
You add a method named getPendingWrites that requires this lock be held. Could 
we tie the method and the lock together better? Name it pendingWritesLock? (The 
name of the list to hold the pending writes is pendingWrites).
== done
bq. (because the HDFS write-method is pretty heavyweight as far as locking is 
concerned.) I think the heavyweight referred to in the above is hbase 
locking...please adjust the comment
== done
bq.Comments on what these threads do will help the next code reader
== done
bq.Your patch should remove the optional flush config from hbase-default.xml 
too since it no longer is relevant
== done
bq.A small nit is you might look at other threads in hbase and see how they are 
named...It might be good if these better align
== done
bq.Probably make the number of asyncsyncers a configuration
== done
bq.but we do not seem to be doing it on the other call to doWrite at around 
line #969 inside in append
== doWrite is called inside append, and the bufferLock(now renamed to 
pendingWritesLock) is held there
bq.This method(setPendingTxid) is only called at close time if I read the patch 
right
== it's called inside append() once doWrite() is done to notify AsyncWriter 
there are new pendingWrites to write to HDFS, it's not called at close time. 
you can double check it:-)
bq.Is this 'fatal'? Or is it an 'error' 
== done
bq.and request a log roll yet we carry on to try and sync, an op that will 
likely fail? We are ok here? We updated the write txid but not the sync txid so 
that should be fine.
== we can't retry to sync after a log roll since we can't sync to a new hlog 
while the writes were written to the old hlog. we failed all the transactions 
with txid = write txid, it's ok here.
bq.Do we need this: if (!asyncSyncers[ i ].isSyncing()) DFSClient will allow us 
call sync concurrently. I think DFSClient allow us call sync 
concurrently...HDFS will handle(synchronize) concurrent sync?
bq.Can these be static classes or do they need context form the hosting FSHLog?
== they all need context from hosting FSHLog (such as 
writer/asyncWriter/asyncSyncer/asyncNotifier)
bq.These method names should not talk about 'flush'. They should be named 
'sync' instead. Same for the flushlock.
== done
bq.Why atomic boolean and not just a volatile here? private AtomicBoolean 
isSyncing = new AtomicBoolean(false);
== done
bq.The above is very important. All your threads do this?
== yes
bq.It talks about writeChunk being expensive but we are not doing anything to 
ameliorate dfsclient writes if I dig down into our log writer
== done (remove it)

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch,

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843227#comment-13843227
 ] 

Feng Honghua commented on HBASE-8755:
-

I'll read and update according to [~v.himanshu]'s comment tomorrow. thanks.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-09 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843231#comment-13843231
 ] 

Jean-Marc Spaggiari commented on HBASE-8755:


For the random, you can also use something like System.currentTimeMillis() % 
asyncSyncers.length; Not saying that yours is not correct ;)

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

[
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843925#comment-13843925
]

Feng Honghua commented on HBASE-8755:
-

[~v.himanshu]:
bq.1) log rolling thread safety: Log rolling happens in parallel with
flush/sync. Currently, in FSHLog, sync call grabs the updateLock to ensure it
has a non-null writer (because of parallel log rolling). How does this patch
address the non-null writer? Or is it not needed anymore? Also, if you go for
the updatelock in sync, that might result in deadlock.
== Good question:-). in rollWriter(), before switching this.writer to the
newly created writer, *updateLock is held* and *cleanupCurrentWriter() is
called*. *updateLock is held* guarantees no new edits enters pendingWrites,
and *cleanupCurrentWriter() is called* guarantees all edits in current
pendingWrites must be written to hdfs and sync-ed(inside this method 'sync()'
is called to provide this guarantee). This means when switching this.writer to
newly HLog writer, no new edits enter and all current edits have already been
sync-ed to hdfs, all AsyncWriter/AsyncSyncer threads have nothing to do and are
idle, so no log rolling thread safety issue here
bq.2) Error handling: It is not very clear how is flush/sync failures are being
handled?... Let's say there are two handlers waiting for sync, t1 on txid 8 and
t2 on txid 10. And, t1 wakes up on notification. Would t1 also get this
exception? Wouldn't it be wrong, because txid 8 may have succeeded? Please
correct me if I missed anything.
== your understanding here is correct, but the write semantic of HBase is
'successful write response means a successful write, but failed write response
can mean either a successful write or a failed write', right? and I have
already mentioned this in above comment. this behavior can be improved by
adding a txid-range to txid mapping, this mapping can help exactly indicate a
failed txid fail which pending txid-range
bq.3) I think the current HLogPE doesn't do justice to the real use
case...Almost all HLog calls are appendNoSync, followed by a sync call. In the
current HLogPE, we are calling append calls, which also does the sync
== you're right, but from an whole view of write process, these two have no
big difference concerning performance: append() calls sync() inside, and in
real case of HRegion.java, appendNoSync and sync is called, since sync now is a
pending on notification, the difference of these two behavior is sooner/later
to pend on notification, no impact on overall write performance(after put edits
to pendingWrites, write handler thread can just wait for write/sync to hdfs to
finish and can't help/influence write performance)...right?
bq.4) Perf numbers are super impressive. It would have been wonderful to have
such numbers for lesser number of handler threads also (e.g., 5-10 threads).
IMHO, that represents most common case scenario, but I could be wrong. I know
this has been beaten to death in the discussions above, but just echoing my
thoughts here
== I think maybe many guys hold a same point of view as yours here, but I also
have my personal thought on this:-) Throughput is different from latency:
latency represents how quickly a system perform user requests, the quicker the
better; while throughput represents how many requests a system perform user
requests(concurrently, or to be more accurate, within a given time frame), the
more the better. these two are both indicate a system's capability for
performing user requests, but in different angles. certainly for application
with low-medium write stress less write threads within client to issue requests
is OK, but for application with high write stress, users/clients may feel bad
if system just can't serve/reach their real-world throughput, *no matter* how
many client threads are configured/added. with the improvement of this patch,
at least we *can* satisfy users' such high throughput requirement by adding
client threads. And without improving individual write request's latency(as
this patch does, it does nothing for individual write's latency), it's hard to
improve throughput for *synchronous* client write thread(it can issue next
request only after it's done with the current one), that's why this patch has
less effect for single/less write threads. If HBase supports *asynchronous*
client write thread, I think this patch can also provide big improvement even
with less client write threads.
bq.5) I should also mention that while working on a different use case, I was
trying to bring a layer of indirection b/w regionserver handlers and sync
operation (Sync is the most costly affair in all HLog story). ...
=== glad to see similar approach:-). wonder if have same improvement under
heavy write stress(such as 50/100/200 client write threads). Looking forward to
seeing your final test results:-)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843926#comment-13843926
 ] 

Feng Honghua commented on HBASE-8755:
-

[~jmspaggi]
bq.For the random, you can also use something like System.currentTimeMillis() % 
asyncSyncers.length
== yeah. System.currentTimeMillis() is a good candidate. txid can do the same 
thing as well. thanks:-)

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch, HBASE-8755-trunk-v6.patch, HBASE-8755-v5.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-06 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841093#comment-13841093
 ] 

Andrew Purtell commented on HBASE-8755:
---

This work looks pretty far along. If we can get it in soon I would be willing 
to try putting this in to 0.98 so this major improvement can manifest in a 
release. Exciting results.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-05 Thread Himanshu Vashishtha (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840001#comment-13840001
 ] 

Himanshu Vashishtha commented on HBASE-8755:


This is some awesome stuff happening here. 

I have few comments apart from what Stack already mentioned.

1) log rolling thread safety: Log rolling happens in parallel with flush/sync. 
Currently, in FSHLog, sync call grabs the updateLock to ensure it has a 
non-null writer (because of parallel log rolling). How does this patch address 
the non-null writer? Or is it not needed anymore? Also, if you go for the 
updatelock in sync, that might result in deadlock.

2) Error handling: It is not very clear how is flush/sync failures are being 
handled? For example, if a write fails for txid 10, it notifies AsyncSyncer 
that writer is done with txid 10. And, AsyncSyncer notifies the notifier 
thread, which finally notifies the blocked handler using notifyAll. The handler 
checks for the failedTxid here:
{code}
+  if (txid = this.failedTxid.get()) {
{code}
Let's say there are two handlers waiting for sync, t1 on txid 8 and t2 on txid 
10. And, t1 wakes up on notification. Would t1 also get this exception? 
Wouldn't it be wrong, because txid 8 may have succeeded? Please correct me if I 
missed anything.

3) I think the current HLogPE doesn't do justice to the real use case. Almost 
*all* HLog calls are appendNoSync, followed by a sync call.
In the current HLogPE, we are calling append calls, which also does the sync. 
When I changed it to represent the above common case,
the performance numbers of current FSHLog using HLogPE improves quite a bit. I 
still need to figure out the reason, but the HLogPE change affects the perf 
numbers considerably IMO.
For example, on a 5 node cluster, I see this difference on trunk:
Earlier: 
{code}
Summary: threads=3, iterations=10 took 218.590s 1372.432ops/s
{code}
Post HlogPE change:
{code}
Summary: threads=3, iterations=10 took 172.648s 1737.640ops/s
{code}
I think it would be great if we can test this with the correct HLogPE. What do 
you think Fenghua?

4) Perf numbers are super impressive. It would have been wonderful to have such 
numbers for lesser number of handler threads also (e.g., 5-10 threads). IMHO, 
that represents most common case scenario, but I could be wrong. I know this 
has been beaten to death in the discussions above, but just echoing my thoughts 
here. 

5) I should also mention that while working on a different use case, I was 
trying to bring a layer of indirection b/w regionserver handlers and sync 
operation (Sync is the most costly affair in all HLog story). What resulted is 
a separate set of syncer threads, which does the work of flushing the edits and 
syncing to HDFS. This is what it looks like:
a) The handlers append their entries to the FSHLog buffer as they do currently. 
b) They invoke sync API. There, they wait on the Syncer threads to do the work 
for them and notify. 
c) It results in batched sync effort but without much extra locking/threads. 
Basically, it is similar to what you did here but minus Writer  Notifier 
threads and minus the bufferlock.
I mentioned it here because with that approach, I see some perf improvement 
even with lesser number of handler threads. And, it also keeps the current 
deferredLogFlush behavior. This work is still in progress and is still a 
prototype. It would be great to know your take on it.

Here are some numbers on a 5 node cluster; hdfs 2.1.0; hbase is trunk; client 
on a different node. I haven't tested it with larger number of threads but 
would be good to compare I think (its 2am here... ). It uses 3 syncers at the 
moment (and varying it would be a good thing to experiment too). 
Also, without patch in the below table means trunk + HLogPE patch.

||Threads||w/o patch time||w/o patch ops||w/ patch time||w/ patch ops||
|3|172.648s|1737.640ops/s|170.332s|1761.266ops/s|
|3|170.977s|1754.622ops/s|174.568s|1718.528ops/s|
|5|213.738s|2339.313ops/s|191.119s|2616.171ops/s|
|5|211.072s|2368.860ops/s|189.671s|2636.144ops/s|
|10|254.641s|3926.419ops/s|216.494s|4619.319ops/s|
|10|251.503s|3978.564ops/s|215.333s|4643.266ops/s|
|10|251.692s|3970.579ops/s|217.151s|4605.854ops/s|
|20|648.943s|6163.870ops/s|646.279s|6189.277ops/s|
|20|658.654s|6072.991ops/s|656.277s|6094.987ops/s|
|25|282.654s|8861.991ops/s|249.277s|10033.987ops/s|


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt,

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-05 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840074#comment-13840074
 ] 

stack commented on HBASE-8755:
--

Nice review [~himan...@cloudera.com]

On 3., above, yes, that is true.  HLogPE does not seem to be representative as 
you suggest.  But does your  change below

{code}
-hlog.append(hri, hri.getTable(), walEdit, now, htd, 
region.getSequenceId());
+// this is how almost all users of HLog use it (all but compaction 
calls).
+long txid = hlog.appendNoSync(hri, hri.getTable(), walEdit, 
clusters, now, htd,
+  region.getSequenceId(), true, nonce, nonce);
+hlog.sync(txid);
+
{code}

... bring it closer to a 'real' use case?  I see over in HRegion that we do a 
bunch of appendNoSync in minibatch or even in put before we call sync.   Should 
we append more than just one set of edits before we call the sync?

I suppose on a regionserver with a load of regions loaded up on it, all these 
syncs can come crashing in on top of each other on to the underlying WAL in an 
arbitary manner -- something Feng Honghua's patch mitigates some by making it 
so syncs are done when FSHLog thinks it appropriate rather than when some 
arbitrary HRegion call thinks it right ... and this is probably part of the 
reason for the perf improvement.

Could we better regulate the sync calls so they are even less arbitrary?  Even 
them out?  It could make for better performance if there was a mechanism 
against syncs clumping together.

Looking at your patch, the syncer is very much like Feng Honghua's -- it is 
interesting that you two independently came up w/ similar multithreaded syncing 
mechanism.  That would seem to 'prove' this is a good approach.  Feng's patch 
is much further along with a bunch of cleanup of FSHLog.  Will wait on his 
comments on what he thinks of doing without AsyncWriter and AsyncNotifier.

Looks like your patch is far enough long for us to do tests comparing the 
approaches? 



 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-05 Thread Jonathan Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840663#comment-13840663
 ] 

Jonathan Hsieh commented on HBASE-8755:
---

I posted v4 for trunk on reviewboard and will reviewing there.

https://reviews.apache.org/r/16052/

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-05 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840710#comment-13840710
 ] 

stack commented on HBASE-8755:
--

Let me look into changing HLogPE after Himanshu's suggestion above.  I'll add 
an option to batch up edits some.  My sense is that it will make the difference 
between current code and Honghua's patch even larger when the count of threads 
is low but that it will not change the numbers when thread count is high.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-05 Thread Himanshu Vashishtha (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840733#comment-13840733
 ] 

Himanshu Vashishtha commented on HBASE-8755:


I agree on HLogPE comment. Yes, grouping appendNoSync before calling sync is 
the right way to go. The existing one is way-off the real use.

I figured that the w/o patch col in the above table contains Feng's patch too. 
I am re-running the experiments at the moment with three versions: Trunk, Trunk 
+ Feng's patch, Trunk + Syncer's approach. And, all versions have that HLogPe 
fix on it; will report back the numbers once done.

Going by what we are seeing here, batching sync calls is definitely the right 
way IMO.
I agree that Feng Honghua's patch has been tested well enough, and I really 
like the radical cleanup it does. The code reads pretty clean now, though it 
involves far more number of threads and synchronization stuff (which makes it 
more interesting to debug too :)), I just wanted to ensure that it is safe. 

The reason I mentioned this different approach is it adds lesser number of 
threads (while keeping the current behaviour), and also shows improvement with 
smaller number of handlers, which to me looks like a nice win over current 
FSHLog. This is still in a prototype stage, and I absolutely don't want to 
block Feng's superb piece of work here. It would be good to know his thoughts 
on this. Thanks.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-05 Thread Feng Honghua (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13840812#comment-13840812
 ] 

Feng Honghua commented on HBASE-8755:
-

Thanks very much for [~v.himanshu] and [~stack]'s review and further 
improvement suggestion. Currently I have some other stuff on hand to handle... 
I'll try best to come back soon and read through above comments. Thanks again.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755-syncer.patch, 8755trunkV2.txt, 
 HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-04 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839662#comment-13839662
 ] 

stack commented on HBASE-8755:
--

Here is more review on the patch.  Make the changes suggested below and I'll +1 
it.

(Discussion off-line w/ Feng on this issue helped me better understand this 
patch and put to rest any notion that there is an easier 'fix' than the one 
proposed here.  That said.  There is much room for improvement but this can be 
done in a follow-on)

Remove these asserts rather than comment them out given they depended on a 
facility this patch removes.  Leaving them in will only make the next reader of 
the code -- very likely lacking the context you have -- feel uneasy thinking 
someone removed asserts just to get tests to pass.

 8 -assertTrue(Should have an outstanding WAL edit, ((FSHLog) 
log).hasDeferredEntries());
 9 +//assertTrue(Should have an outstanding WAL edit, ((FSHLog) 
log).hasDeferredEntries());

On the below...

+import java.util.Random;

... using a Random for choosing an arbitrary thread for a list of 4 is 
heavyweight.  Can you not take last digit of timestamp or nano timestamp or 
some attribute of the edit instead?  Something more lightweight?

Please remove all mentions of AsyncFlush since it no longer exists:

// all writes pending on AsyncWrite/AsyncFlush thread with

Leaving it in will confuse readers when they can't find any such thread class.

Is this comment right?

// txid = failedTxid will fail by throwing asyncIOE

Should it be = failedTxid?

This should be volatile since it is set by AsyncSync and then used by the main 
FSHLog thread (you have an assert to check it not null -- maybe you ran into an 
issue here already?):

 +  private IOException asyncIOE = null;

bq. +  private final Object bufferLock = new Object();

'bufferLock' if a very generic name. Could it be more descriptive?  It is a 
lock held for a short while while AsyncWriter moves queued edits off the 
globally seen queue to a local queue just before we send the edits to the WAL.  
You add a method named getPendingWrites that requires this lock be held.  Could 
we tie the method and the lock together better? Name it pendingWritesLock?  
(The name of the list to hold the pending writes is pendingWrites).

bq. ...because the HDFS write-method is pretty heavyweight as far as locking is 
concerned.

I think the heavyweight referred to in the above is hbase locking, not hdfs 
locking as the comment would imply.  If you agree (you know this code better 
than I), please adjust the comment.

Comments on what these threads do will help the next code reader.  AsyncWriter 
does adding of edits to HDFS.  AsyncSyncer needs a comment because it is 
oxymoronic (though it makes sense in this context).  In particular, a comment 
would draw out why we need so many instances of a syncer thread because 
everyone's first thought here is going to be why do we need this?  Ditto on the 
AsyncNotifier.  In the reviews above, folks have asked why we need this thread 
at all and a code reader will likely think similar on a first pass.  
Bottom-line, your patch raised questions from reviewers; it would be cool if 
the questions were answered in code comments where possible so the questions do 
not come up again.

  4 +  private final AsyncWriter   asyncWriter;
  5 +  private final AsyncSyncer[] asyncSyncers = new AsyncSyncer[5];
  6 +  private final AsyncNotifier asyncNotifier;

You remove the LogSyncer facility in this patch.  That is good (need to note 
this in release notes).  Your patch should remove the optional flush config 
from hbase-default.xml too since it no longer is relevant.

  3 -this.optionalFlushInterval =
  4 -  conf.getLong(hbase.regionserver.optionallogflushinterval, 1 * 
1000);

I see it here...

hbase-common/src/main/resources/hbase-default.xml:
namehbase.regionserver.optionallogflushinterval/name

A small nit is you might look at other threads in hbase and see how they are 
named... 

3 +asyncWriter = new AsyncWriter(AsyncHLogWriter);

Ditto here:

 +  asyncSyncers[i] = new AsyncSyncer(AsyncHLogSyncer + i);

Probably make the number of asyncsyncers a configuration (you don't have to put 
the option out in hbase-default.xml.. just make it so that if someone is 
reading the code and trips over this issue, they can change it by adding to 
hbase-site.xml w/o having to change code -- lets not reproduce the hard-coded 
'80' that is in the head of dfsclient we discussed yesterday -- smile).

... and here: asyncNotifier = new AsyncNotifier(AsyncHLogNotifier);

Not important but check out how other threads are named in hbase.  It might be 
good if these better align.

Maybe make a method for shutting down all these thread or use the 
Threads#shutdown method in Threads.java?

bq. LOG.error(Exception while waiting for AsyncNotifier threads to die, e);

Do LOG.error(Exception

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-03 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838679#comment-13838679
]

stack commented on HBASE-8755:
--

Sorry for the delay getting back to this [~fenghh] and thanks for the
explanation.

I am having trouble reviewing the patch because I am trying to understand what
is going on here in FSHLog. It is hard to follow (not your patch necessarily
but what is there currently) in spite of multiple reviews. I keep trying to
grok what is going on because this is critical code.

The numbers are hard to argue with and it does some nice cleanup of FSHLog
which makes it easier to understand. We could commit this patch and then work
on undoing the complexity that is rife here; your patch adds yet more because
it adds interacting threads w/ new synchronizations, notifications,
AtomicBoolean states, etc., which cost performance-wise but at least it is
clearer what is going on and we have tools for comparing approaches now. We
could work on simplication and removal of sync points in a follow-on (See below
for a note on one approach).

I now get why the need for multiple syncers. It is a little counter-intuitiive
given we want to batch up edits more to get more performance on the one hand,
but then on the other, we have to sync more often because sync'ing is
outstanding for too much time, so much time it holds up handlers too long.

+ I am trying to understand why we keep aside the edits in a linked-list. This
was there before your time. You just continue the practice. The original
comment says We keep them cached here instead of writing them to HDFS
piecemeal, because the HDFS write-method is pretty heavyweight as far as
locking is concerned.Yet, when we eventually flush the edits, we don't do
anything special; we just call write on the dfsoutputstream. We are not
avoiding locking in hdfs. It must be the hbase flush/update locking that is
being referred to here.
+ AsyncSyncer is a confounding name for a class -- but it makes sense in this
context. The flush object in this thread is a syncer synchronization object
not for memstore flushes... as I thought it was (there is use of flush in here
when it probably should be sync to be consistent).

Off-list, a few other lads are interested in reviewing this patch (it is a
popular patch!)... our [~j...@cloudera.com] and possible
[~himan...@cloudera.com] because they are getting stuck in this area. If they
don't get to it soon, I'll commit unless objection.

A new write thread model for HLog to improve the overall HBase write
throughput
---

Key: HBASE-8755
URL: https://issues.apache.org/jira/browse/HBASE-8755
Project: HBase
Issue Type: Improvement
Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch,
HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch,
HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch,
HBASE-8755-trunk-v4.patch

In current write model, each write handler thread (executing put()) will
individually go through a full 'append (hlog local buffer) = HLog writer
append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write,
which incurs heavy race condition on updateLock and flushLock.
The only optimization where checking if current syncTillHere txid in
expectation for other thread help write/sync its own txid to hdfs and
omitting the write/sync actually help much less than expectation.
Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi
proposed a new write thread model for writing hdfs sequence file and the
prototype implementation shows a 4X improvement for throughput (from 17000 to
7+).
I apply this new write thread model in HLog and the performance test in our
test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1
RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size)
even beats the one of BigTable (Precolator published in 2011 says Bigtable's
write throughput then is 31002). I can provide the detailed performance test
results if anyone is interested.
The change for new write thread model is as below:
1 All put handler threads append the edits to HLog's local pending buffer;
(it notifies AsyncWriter thread that there is new edits in local buffer)
2 All put handler threads wait in HLog.syncer() function for underlying
threads to finish the sync that contains its txid;
3 An single AsyncWriter thread is responsible for retrieve all the buffered
edits in HLog's local pending buffer and write to the hdfs
(hlog.writer.append); (it notifies AsyncFlusher thread that there is new
writes

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-12-03 Thread Feng Honghua (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13838687#comment-13838687
 ] 

Feng Honghua commented on HBASE-8755:
-

Thanks [~stack], and also the review from [~j...@cloudera.com] and 
[~himan...@cloudera.com], and other experienced guys is welcome, it's better to 
have this patch reviewed by as many guys as possible.

Any question on this patch is welcome :-)

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-25 Thread Feng Honghua (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13832233#comment-13832233
 ] 

Feng Honghua commented on HBASE-8755:
-

tried 4 asyncSyncer threads, below are the results. a bit worse than 5 threads 
but looks like acceptable?
||threads||ops-wo-patch||ops-w-patch||ops-diff||
|1|1716|1572|-8.4%|
|3|3179|3189|+0.3%|
|5|6091|5593|-8.1%|
|10|8760|9450|+7.8%|
|25|13019|18055|+38.7%|
|50|14995|26597|+77.3%|
|100|18824|51441|+173.2%|
|200|18144|61531|+239.1%|

additional explanation on correctness when introducing extra asyncSyncer 
threads:
- when a txid(t0) is notified, all txid smaller than t0 must already be written 
to hdfs and by sync-ed: before t0 is notified, t0 must be sync-ed by an 
asyncSyncer thread; before t0 is sync-ed, t0 must be written to hdfs by 
asyncWriter thread; before t0 is written to hdfs, all txid smaller than t0 must 
be written to hdfs, so the sync of t0 can guarantee all txid smaller than t0 
must be sync-ed (either before the sync of t0, or by the sync of t0)
- when a txid(t0) can't find free(idle) asyncSyncer thread and added to a 
random one, it won't be sync-ed until its asyncSyncer thread is done with the 
txid at hand. but its entries already have been written to hdfs, and if any 
bigger txid than t0 (say t1) is successfully sync-ed by another parallel 
asyncSyncer thread, that sync can guarantee t0 also successfully sync-ed, hence 
when t1 is notified, t0 can also be correctly notified.

any further comments?

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-23 Thread Feng Honghua (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830678#comment-13830678
 ] 

Feng Honghua commented on HBASE-8755:
-

bq.Do these no longer pass?
= yes, under new thread model, no explicit method to do the sync and can't 
tell if there is outstanding deferred entries (the AsyncWriter/AsyncSyncer 
threads do write/sync in a best-effort way)

bq.We have hard-coded 5 asyncSyncers? Why 5?
= yes, I tried 2/3/5/10 and found 5 is the best number (2/3 have worse perf, 
10 has equal perf but introduces too many extra threads)

bq.If we fail to find a free syncer, i don't follow what is going on w/ 
choosing a random syncer and setting txid as in below
= when fail to find a idle syncer(which is doing sync), choosing a random 
syncer and setting txid that way fall into the same way before introducing 
extra asyncSyncer threads: when asyncWriter pushes new entries to hdfs before 
asyncSyncer sync the previously pushed ones, asyncSyncer gets notified the 
newly pushed txid, but these txid will be synced by next time after asyncSyncer 
is done with the current ones, notice we use txidToFlush to record txid each 
sync is for, and it can't change during each sync, while writtenTxid can change 
during each sync)

To summary: the sync operation is the most time-consuming phase, under old 
write model every write handler issues a separate sync directly for itself(if 
not return early by syncedTillHere). and under new write model, though separate 
threads significantly reduce the lock race, but if concurrent write threads is 
few, the benefit by reducing lock race(fewer write threads, fewer benefit) 
can't offset the inefficiency by using a single asyncSyncer threads(each time 
asyncSyncer thread can only sync for a portion of the writes, but the write 
handlers which already have their entries in buffer or pushed to hdfs also need 
to wait for its completeness, and can't proceed until its next sync phase is 
done)
By introducing extra asyncSyncer threads, the correctness of this model is the 
same as before: still a single asyncWriter thread which push buffered entries 
to hdfs sequentially(txid increases sequentially), and when each asyncSyncer is 
done, it's guaranteed all txids smaller are pushed to hdfs and successfully 
sync-ed.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-23 Thread Feng Honghua (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830681#comment-13830681
 ] 

Feng Honghua commented on HBASE-8755:
-

bq.= yes, under new thread model, no explicit method to do the sync and can't 
tell if there is outstanding deferred entries (the AsyncWriter/AsyncSyncer 
threads do write/sync in a best-effort way)

I meant calling 'sync' can guarantee no outstanding deferred entries, but no 
calling 'sync' after writew can't guarantee there must be some outstanding 
deferred entries since they can be sync-ed by asyncWriter/asyncSyncer threads. 
This is not the same behavior as under old write model.

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-23 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830753#comment-13830753
 ] 

Ted Yu commented on HBASE-8755:
---

bq. 2/3 have worse perf, 10 has equal perf but introduces too many extra threads
Do you remember how many extra threads were introduced ?

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830755#comment-13830755
 ] 

stack commented on HBASE-8755:
--

bq. Do you remember how many extra threads were introduced ?

This question is answered above in my last review.  Why do you ask?

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830356#comment-13830356
 ] 

stack commented on HBASE-8755:
--

I ran Feng's script except I left out trying 200 threads.

||Threads||time-wo-patch||ops-wo-patch||time-w-patch||ops-w-patch||ops-diff||
|1|973.673|1027.039|1119.825|892.997|-15%|
|3|1303.891|2300.806|1400.848|2141.560|-7%|
|5|855.775|5842.657|873.990|5720.889|-2%|
|10|1093.330|9146.370|1090.158|9172.982|0%|
|25|1632.263|15316.160|1215.196|20572.813|+25%|
|50|2432.653|20553.691|1341.847|37262.070|+45%|
|100|4058.650|24638.734|1725.729|57946.527|57%|

This was stock hadoop 2.2 and tip of the hbase trunk.

Those are pretty big improvements once we pass out ten concurrent threads.  
Some small down when one writer only is acceptable I'd say.  Let me look again 
at the patch.


 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830360#comment-13830360
 ] 

stack commented on HBASE-8755:
--

We looked at using disruptor here?  MPSC seems to be what we have going on here 
to which disruptor seems pretty well suited?  Looking at the patch now...

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-11-22 Thread Feng Honghua (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830516#comment-13830516
 ] 

Feng Honghua commented on HBASE-8755:
-

Thanks [~stack]

A small clarification of why we got so different ops-diff: I used *ops-diff = 
(new - old) / old*  and you used *ops-diff = (new - old) / new*. for example, 
the 100 threads result, old one is 24638 and new one is 57946, the ops-diff is 
*135.2%* (new one is 2.35 times of old one), while you got *57%* :-)

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput


[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830563#comment-13830563
 ] 

stack commented on HBASE-8755:
--

Pardon my bad math [~fenghh]  Below is correction (numbers are better now).

||Threads||time-wo-patch||ops-wo-patch||time-w-patch||ops-w-patch||ops-diff||
|1|973.673|1027.039|1119.825|892.997|-15%|
|3|1303.891|2300.806|1400.848|2141.560|-7%|
|5|855.775|5842.657|873.990|5720.889|-2%|
|10|1093.330|9146.370|1090.158|9172.982|0%|
|25|1632.263|15316.160|1215.196|20572.813|+34%|
|50|2432.653|20553.691|1341.847|37262.070|+81%|
|100|4058.650|24638.734|1725.729|57946.527|+135%|

 A new write thread model for HLog to improve the overall HBase write 
 throughput
 ---

 Key: HBASE-8755
 URL: https://issues.apache.org/jira/browse/HBASE-8755
 Project: HBase
  Issue Type: Improvement
  Components: Performance, wal
Reporter: Feng Honghua
Assignee: stack
Priority: Critical
 Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, 
 HBASE-8755-0.94-V1.patch, HBASE-8755-0.96-v0.patch, 
 HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, 
 HBASE-8755-trunk-v4.patch


 In current write model, each write handler thread (executing put()) will 
 individually go through a full 'append (hlog local buffer) = HLog writer 
 append (write to hdfs) = HLog writer sync (sync hdfs)' cycle for each write, 
 which incurs heavy race condition on updateLock and flushLock.
 The only optimization where checking if current syncTillHere  txid in 
 expectation for other thread help write/sync its own txid to hdfs and 
 omitting the write/sync actually help much less than expectation.
 Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
 proposed a new write thread model for writing hdfs sequence file and the 
 prototype implementation shows a 4X improvement for throughput (from 17000 to 
 7+). 
 I apply this new write thread model in HLog and the performance test in our 
 test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
 RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
 even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
 write throughput then is 31002). I can provide the detailed performance test 
 results if anyone is interested.
 The change for new write thread model is as below:
  1 All put handler threads append the edits to HLog's local pending buffer; 
 (it notifies AsyncWriter thread that there is new edits in local buffer)
  2 All put handler threads wait in HLog.syncer() function for underlying 
 threads to finish the sync that contains its txid;
  3 An single AsyncWriter thread is responsible for retrieve all the buffered 
 edits in HLog's local pending buffer and write to the hdfs 
 (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
 writes to hdfs that needs a sync)
  4 An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
 to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
 that sync watermark increases)
  5 An single AsyncNotifier thread is responsible for notifying all pending 
 put handler threads which are waiting in the HLog.syncer() function
  6 No LogSyncer thread any more (since there is always 
 AsyncWriter/AsyncFlusher threads do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput