[jira] [Commented] (HBASE-10659) [89-fb] Optimize the threading model in HBase write path

2014-03-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921846#comment-13921846
 ] 

stack commented on HBASE-10659:
---

bq. I  don' fully understand your last question. 

Its ok.  You answered it.  The way we do client response is different between 
trunk and 0.89.  We went different route in trunk to '...reduce the thread 
interleaving...' with handlers' coming together on a ring buffer with one 
thread pulling from the ring and then multiple sync'ing threads syncing.  
Handlers's hang out stuck on a latch till their sync clears.  We need to do 
like you lads and have the seqid be the mvcc up in memstore.

 [89-fb] Optimize the threading model in HBase write path
 

 Key: HBASE-10659
 URL: https://issues.apache.org/jira/browse/HBASE-10659
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang

 Recently, we have done multiple prototypes to optimize the HBase (0.89)write 
 path. And based on the simulator results, the following model is able to 
 achieve much higher overall throughput with less threads.
 IPC Writer Threads Pool: 
 IPC handler threads will prepare all Put requests, and append the WALEdit, as 
 one transaction, into a concurrent collection with a read lock. And then just 
 return;
 HLogSyncer Thread:
 Each HLogSyncer thread is corresponding to one HLog stream. It swaps the 
 concurrent collection with a write lock, and then iterate over all the 
 elements in the previous concurrent collection, generate the sequence id for 
 each transaction, and write to HLog. After the HLog sync is done, append 
 these transactions as a batch into a blocking queue. 
 Memstore Update Thread:
 The memstore update thread will poll the blocking queue and update the 
 memstore for each transaction by using the sequence id as MVCC. Once the 
 memstore update is done, dispatch to the responder thread pool to return to 
 the client.
 Responder Thread Pool:
 Responder thread pool will return the RPC call in parallel. 
 We are still evaluating this model and will share more results/numbers once 
 it is ready. But really appreciate any comments in advance !



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10659) [89-fb] Optimize the threading model in HBase write path

2014-03-05 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921860#comment-13921860
 ] 

Liyin Tang commented on HBASE-10659:


One of key motivations is to avoid handler waiting on the sync thread. This 
model requires more IPC handler threads to reach the maximum QPS. I will share 
more detail numbers once it is ready.

 [89-fb] Optimize the threading model in HBase write path
 

 Key: HBASE-10659
 URL: https://issues.apache.org/jira/browse/HBASE-10659
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang

 Recently, we have done multiple prototypes to optimize the HBase (0.89)write 
 path. And based on the simulator results, the following model is able to 
 achieve much higher overall throughput with less threads.
 IPC Writer Threads Pool: 
 IPC handler threads will prepare all Put requests, and append the WALEdit, as 
 one transaction, into a concurrent collection with a read lock. And then just 
 return;
 HLogSyncer Thread:
 Each HLogSyncer thread is corresponding to one HLog stream. It swaps the 
 concurrent collection with a write lock, and then iterate over all the 
 elements in the previous concurrent collection, generate the sequence id for 
 each transaction, and write to HLog. After the HLog sync is done, append 
 these transactions as a batch into a blocking queue. 
 Memstore Update Thread:
 The memstore update thread will poll the blocking queue and update the 
 memstore for each transaction by using the sequence id as MVCC. Once the 
 memstore update is done, dispatch to the responder thread pool to return to 
 the client.
 Responder Thread Pool:
 Responder thread pool will return the RPC call in parallel. 
 We are still evaluating this model and will share more results/numbers once 
 it is ready. But really appreciate any comments in advance !



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10659) [89-fb] Optimize the threading model in HBase write path

2014-03-04 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919818#comment-13919818
 ] 

Liyin Tang commented on HBASE-10659:


1) IPC writer thread will do all the sanity check as a preparation, such as 
figure out which Region and whether it is enabled.
2) IPC writer thread will handoff the Put request, and then start to process 
next IPC request. It won't block or wait for the current Put request to finish. 
The responder thread will finally return the call to the clients. 
3) One HLogSyncer per WAL, and each HLogSyncer has its own concurrent 
collections to swap between.
4)  I don' fully understand your last question. Since the HLogSyncer thread has 
already one the sequencing for each transaction, memstore-update-thread could 
just reuse the same sequence id as MVCC.

The basic motivation of this new write path is to reduce the thread 
interleaving and synchronizations in the critical write path as much as 
possible.

 [89-fb] Optimize the threading model in HBase write path
 

 Key: HBASE-10659
 URL: https://issues.apache.org/jira/browse/HBASE-10659
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang

 Recently, we have done multiple prototypes to optimize the HBase (0.89)write 
 path. And based on the simulator results, the following model is able to 
 achieve much higher overall throughput with less threads.
 IPC Writer Threads Pool: 
 IPC handler threads will prepare all Put requests, and append the WALEdit, as 
 one transaction, into a concurrent collection with a read lock. And then just 
 return;
 HLogSyncer Thread:
 Each HLogSyncer thread is corresponding to one HLog stream. It swaps the 
 concurrent collection with a write lock, and then iterate over all the 
 elements in the previous concurrent collection, generate the sequence id for 
 each transaction, and write to HLog. After the HLog sync is done, append 
 these transactions as a batch into a blocking queue. 
 Memstore Update Thread:
 The memstore update thread will poll the blocking queue and update the 
 memstore for each transaction by using the sequence id as MVCC. Once the 
 memstore update is done, dispatch to the responder thread pool to return to 
 the client.
 Responder Thread Pool:
 Responder thread pool will return the RPC call in parallel. 
 We are still evaluating this model and will share more results/numbers once 
 it is ready. But really appreciate any comments in advance !



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10659) [89-fb] Optimize the threading model in HBase write path

2014-03-03 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918966#comment-13918966
 ] 

Todd Lipcon commented on HBASE-10659:
-

Curious: do you have just a single memstore update thread per region? Any 
results on whether throughput is better when the workload is skewed towards a 
single hot region on a server?

Are you doing any sorting of the batch before going into the memstore update 
thread? That might result in some better performance as well if you have hot 
and cold regions of keyspace.

 [89-fb] Optimize the threading model in HBase write path
 

 Key: HBASE-10659
 URL: https://issues.apache.org/jira/browse/HBASE-10659
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang

 Recently, we have done multiple prototypes to optimize the HBase (0.89)write 
 path. And based on the simulator results, the following model is able to 
 achieve much higher overall throughput with less threads.
 IPC Writer Threads Pool: 
 IPC handler threads will prepare all Put requests, and append the WALEdit, as 
 one transaction, into a concurrent collection with a read lock. And then just 
 return;
 HLogSyncer Thread:
 Each HLogSyncer thread is corresponding to one HLog stream. It swaps the 
 concurrent collection with a write lock, and then iterate over all the 
 elements in the previous concurrent collection, generate the sequence id for 
 each transaction, and write to HLog. After the HLog sync is done, append 
 these transactions as a batch into a blocking queue. 
 Memstore Update Thread:
 The memstore update thread will poll the blocking queue and update the 
 memstore for each transaction by using the sequence id as MVCC. Once the 
 memstore update is done, dispatch to the responder thread pool to return to 
 the client.
 Responder Thread Pool:
 Responder thread pool will return the RPC call in parallel. 
 We are still evaluating this model and will share more results/numbers once 
 it is ready. But really appreciate any comments in advance !



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10659) [89-fb] Optimize the threading model in HBase write path

2014-03-03 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919033#comment-13919033
 ] 

Liyin Tang commented on HBASE-10659:


1) Since updating memstore is much faster than HLog syncing, one 
memstore-update-thread seems to be sufficient. Or we can make it configurable 
as each HLogSyncer thread will have a corresponding memstore-update-thread.

2)  The HLogSyncer thread will batch multiple transactions, as a group commit, 
from different IPC writer threads, and then sync this group commit into HLog 
stream. And then, the memstore-update-thread will take this group commit and 
update the corresponding memstore in (sequence id) order.

 [89-fb] Optimize the threading model in HBase write path
 

 Key: HBASE-10659
 URL: https://issues.apache.org/jira/browse/HBASE-10659
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang

 Recently, we have done multiple prototypes to optimize the HBase (0.89)write 
 path. And based on the simulator results, the following model is able to 
 achieve much higher overall throughput with less threads.
 IPC Writer Threads Pool: 
 IPC handler threads will prepare all Put requests, and append the WALEdit, as 
 one transaction, into a concurrent collection with a read lock. And then just 
 return;
 HLogSyncer Thread:
 Each HLogSyncer thread is corresponding to one HLog stream. It swaps the 
 concurrent collection with a write lock, and then iterate over all the 
 elements in the previous concurrent collection, generate the sequence id for 
 each transaction, and write to HLog. After the HLog sync is done, append 
 these transactions as a batch into a blocking queue. 
 Memstore Update Thread:
 The memstore update thread will poll the blocking queue and update the 
 memstore for each transaction by using the sequence id as MVCC. Once the 
 memstore update is done, dispatch to the responder thread pool to return to 
 the client.
 Responder Thread Pool:
 Responder thread pool will return the RPC call in parallel. 
 We are still evaluating this model and will share more results/numbers once 
 it is ready. But really appreciate any comments in advance !



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10659) [89-fb] Optimize the threading model in HBase write path

2014-03-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919049#comment-13919049
 ] 

stack commented on HBASE-10659:
---

bq, IPC handler threads will prepare all Put requests, and append the WALEdit, 
as one transaction, into a concurrent collection with a read lock. And then 
just return;

Return to the client?  The client then waits on notification back from server 
on when the append completes?  When you say 'prepare all Put requests', do you 
mean the unmarshalling form RPC into a Put instance?

bq. Each HLogSyncer thread is corresponding to one HLog stream. It swaps the 
concurrent collection with a write lock, and then iterate over all the elements 
in the previous concurrent collection, generate the sequence id for each 
transaction, and write to HLog. After the HLog sync is done, append these 
transactions as a batch into a blocking queue.

You have multiple WALs per server?  So one HLogSyncer per WAL?  The concurrent 
collections are kept by WAL or you have one collection and sort it by WALs 
after taking it under write lock?



bq. Responder thread pool will return the RPC call in parallel.

In // because each MemStore Update Thread of which there may be many, each 
checks out a Responder to reply to the client its mvcc/sequenceid?

Thanks Liyin.  Just trying to understand and figuring it how it maps to trunk.

You like the asynchronous response back there in 0.89fb?  We don't use it as 
you do in trunk.

We keep talking about unifying seqid and mvcc

 [89-fb] Optimize the threading model in HBase write path
 

 Key: HBASE-10659
 URL: https://issues.apache.org/jira/browse/HBASE-10659
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang

 Recently, we have done multiple prototypes to optimize the HBase (0.89)write 
 path. And based on the simulator results, the following model is able to 
 achieve much higher overall throughput with less threads.
 IPC Writer Threads Pool: 
 IPC handler threads will prepare all Put requests, and append the WALEdit, as 
 one transaction, into a concurrent collection with a read lock. And then just 
 return;
 HLogSyncer Thread:
 Each HLogSyncer thread is corresponding to one HLog stream. It swaps the 
 concurrent collection with a write lock, and then iterate over all the 
 elements in the previous concurrent collection, generate the sequence id for 
 each transaction, and write to HLog. After the HLog sync is done, append 
 these transactions as a batch into a blocking queue. 
 Memstore Update Thread:
 The memstore update thread will poll the blocking queue and update the 
 memstore for each transaction by using the sequence id as MVCC. Once the 
 memstore update is done, dispatch to the responder thread pool to return to 
 the client.
 Responder Thread Pool:
 Responder thread pool will return the RPC call in parallel. 
 We are still evaluating this model and will share more results/numbers once 
 it is ready. But really appreciate any comments in advance !



--
This message was sent by Atlassian JIRA
(v6.2#6252)