[jira] [Created] (HBASE-15482) Provide an option to skip calculating block locations for SnapshotInputFormat

2016-03-19 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-15482:
--

 Summary: Provide an option to skip calculating block locations for 
SnapshotInputFormat
 Key: HBASE-15482
 URL: https://issues.apache.org/jira/browse/HBASE-15482
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Liyin Tang
Priority: Minor


When a MR job is reading from SnapshotInputFormat, it needs to calculate the 
splits based on the block locations in order to get best locality. However, 
this process may take a long time for large snapshots. 

In some setup, the computing layer, Spark, Hive or Presto could run out side of 
HBase cluster. In these scenarios, the block locality doesn't matter. 
Therefore, it will be great to have an option to skip calculating the block 
locations for every job. That will super useful for the Hive/Presto/Spark 
connectors.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15482) Provide an option to skip calculating block locations for SnapshotInputFormat

2016-03-18 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15202071#comment-15202071
 ] 

Liyin Tang commented on HBASE-15482:


Yeah, that's right. Ideally, if SnapshotInputFormat can read directly from 
snapshot instead of restore, that will be awesome ! Restoring a millions of 
storefiles will also take a long time. But that will be out of the scope of 
this jira.

> Provide an option to skip calculating block locations for SnapshotInputFormat
> -
>
> Key: HBASE-15482
> URL: https://issues.apache.org/jira/browse/HBASE-15482
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Liyin Tang
>Priority: Minor
>
> When a MR job is reading from SnapshotInputFormat, it needs to calculate the 
> splits based on the block locations in order to get best locality. However, 
> this process may take a long time for large snapshots. 
> In some setup, the computing layer, Spark, Hive or Presto could run out side 
> of HBase cluster. In these scenarios, the block locality doesn't matter. 
> Therefore, it will be great to have an option to skip calculating the block 
> locations for every job. That will super useful for the Hive/Presto/Spark 
> connectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15482) Provide an option to skip calculating block locations for SnapshotInputFormat

2016-03-18 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15202539#comment-15202539
 ] 

Liyin Tang commented on HBASE-15482:


Dave, thanks for the response.

Even we use HDFS snapshots, it will be great to have an option to skip 
calculating block locations. To decouple computing with storage , it is 
possible to set up computing layer for query engine like Spark/Hive/Presto in a 
different cluster. In these cases, the locality doesn't matter for both HBase 
and HDFS snapshots.  

> Provide an option to skip calculating block locations for SnapshotInputFormat
> -
>
> Key: HBASE-15482
> URL: https://issues.apache.org/jira/browse/HBASE-15482
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Liyin Tang
>Priority: Minor
>
> When a MR job is reading from SnapshotInputFormat, it needs to calculate the 
> splits based on the block locations in order to get best locality. However, 
> this process may take a long time for large snapshots. 
> In some setup, the computing layer, Spark, Hive or Presto could run out side 
> of HBase cluster. In these scenarios, the block locality doesn't matter. 
> Therefore, it will be great to have an option to skip calculating the block 
> locations for every job. That will super useful for the Hive/Presto/Spark 
> connectors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId

2014-06-04 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018165#comment-14018165
 ] 

Liyin Tang commented on HBASE-8763:
---

Hi, I am out of office since 9/1/2012 to 9/16/2012 and I cannot access to this 
email.
In urgent case, please forward your email to liyint...@gmail.com

Thanks a lot
Liyin


 [BRAINSTORM] Combine MVCC and SeqId
 ---

 Key: HBASE-8763
 URL: https://issues.apache.org/jira/browse/HBASE-8763
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Enis Soztutar
Assignee: Jeffrey Zhong
Priority: Critical
 Attachments: HBase MVCC  LogSeqId Combined.pdf, 
 hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, 
 hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763-v4.patch, 
 hbase-8763-v5.1.patch, hbase-8763-v5.patch, hbase-8763_wip1.patch


 HBASE-8701 and a lot of recent issues include good discussions about mvcc + 
 seqId semantics. It seems that having mvcc and the seqId complicates the 
 comparator semantics a lot in regards to flush + WAL replay + compactions + 
 delete markers and out of order puts. 
 Thinking more about it I don't think we need a MVCC write number which is 
 different than the seqId. We can keep the MVCC semantics, read point and 
 smallest read points intact, but combine mvcc write number and seqId. This 
 will allow cleaner semantics + implementation + smaller data files. 
 We can do some brainstorming for 0.98. We still have to verify that this 
 would be semantically correct, it should be so by my current understanding.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10893) Bug in Fast Diff Delta Block Encoding

2014-04-01 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13957194#comment-13957194
 ] 

Liyin Tang commented on HBASE-10893:


That's a quite serious bug and [~manukranthk] has already had a fix to this. 

 Bug in Fast Diff Delta Block Encoding
 -

 Key: HBASE-10893
 URL: https://issues.apache.org/jira/browse/HBASE-10893
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.89-fb
Reporter: Manukranth Kolloju
Assignee: Manukranth Kolloju
 Fix For: 0.89-fb


 The following 2 key values if encoded and decoded, produce wrong results:
 byte[] row = Bytes.toBytes(abcd);
 byte[] family = new byte[] { 'f' };
 byte[] qualifier0 = new byte[] { 'b' };
 byte[] qualifier1 = new byte[] { 'c' };
 byte[] value0 = new byte[] { 0x01 };
 byte[] value1 = new byte[] { 0x00 };
 kvList.add(new KeyValue(row, family, qualifier0, 0, Type.Put, value0));
 kvList.add(new KeyValue(row, family, qualifier1, 0, Type.Put, value1));
 while using Fast Diff Delta Block encoding.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-10784) [89-fb] Avoid the unnecessary memory copy for RowCol and DeleteColumn Bloom filters

2014-03-18 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-10784:
--

 Summary: [89-fb] Avoid the unnecessary memory copy for RowCol and 
DeleteColumn Bloom filters
 Key: HBASE-10784
 URL: https://issues.apache.org/jira/browse/HBASE-10784
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang


For adding/querying rowcol and deleteColumn BF, there are multiple unnecessary 
memory copy operations. This jira is to address the concern and avoid creating 
these dummy bloom keys as much as possible.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10191) Move large arena storage off heap

2014-03-13 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934565#comment-13934565
 ] 

Liyin Tang commented on HBASE-10191:


Just curious, has anyone experienced any imbalance memory allocation among the 
NUMA nodes when allocating large off heap arena ? 

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Issue Comment Deleted] (HBASE-10191) Move large arena storage off heap

2014-03-13 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-10191:
---

Comment: was deleted

(was: Just curious, has anyone experienced any imbalance memory allocation 
among the NUMA nodes when allocating large off heap arena ? )

 Move large arena storage off heap
 -

 Key: HBASE-10191
 URL: https://issues.apache.org/jira/browse/HBASE-10191
 Project: HBase
  Issue Type: Umbrella
Reporter: Andrew Purtell

 Even with the improved G1 GC in Java 7, Java processes that want to address 
 large regions of memory while also providing low high-percentile latencies 
 continue to be challenged. Fundamentally, a Java server process that has high 
 data throughput and also tight latency SLAs will be stymied by the fact that 
 the JVM does not provide a fully concurrent collector. There is simply not 
 enough throughput to copy data during GC under safepoint (all application 
 threads suspended) within available time bounds. This is increasingly an 
 issue for HBase users operating under dual pressures: 1. tight response SLAs, 
 2. the increasing amount of RAM available in commodity server 
 configurations, because GC load is roughly proportional to heap size.
 We can address this using parallel strategies. We should talk with the Java 
 platform developer community about the possibility of a fully concurrent 
 collector appearing in OpenJDK somehow. Set aside the question of if this is 
 too little too late, if one becomes available the benefit will be immediate 
 though subject to qualification for production, and transparent in terms of 
 code changes. However in the meantime we need an answer for Java versions 
 already in production. This requires we move the large arena allocations off 
 heap, those being the blockcache and memstore. On other JIRAs recently there 
 has been related discussion about combining the blockcache and memstore 
 (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is 
 related work. We should build off heap allocation for memstore and 
 blockcache, perhaps a unified pool for both, and plumb through zero copy 
 direct access to these allocations (via direct buffers) through the read and 
 write I/O paths. This may require the construction of classes that provide 
 object views over data contained within direct buffers. This is something 
 else we could talk with the Java platform developer community about - it 
 could be possible to provide language level object views over off heap 
 memory, on heap objects could hold references to objects backed by off heap 
 memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. 
 Again we need an answer for today also. We should investigate what existing 
 libraries may be available in this regard. Key will be avoiding 
 marshalling/unmarshalling costs. At most we should be copying primitives out 
 of the direct buffers to register or stack locations until finally copying 
 data to construct protobuf Messages. A related issue there is HBASE-9794, 
 which proposes scatter-gather access to KeyValues when constructing RPC 
 messages. We should see how far we can get with that and also zero copy 
 construction of protobuf Messages backed by direct buffer allocations. Some 
 amount of native code may be required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10659) [89-fb] Optimize the threading model in HBase write path

2014-03-05 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921860#comment-13921860
 ] 

Liyin Tang commented on HBASE-10659:


One of key motivations is to avoid handler waiting on the sync thread. This 
model requires more IPC handler threads to reach the maximum QPS. I will share 
more detail numbers once it is ready.

 [89-fb] Optimize the threading model in HBase write path
 

 Key: HBASE-10659
 URL: https://issues.apache.org/jira/browse/HBASE-10659
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang

 Recently, we have done multiple prototypes to optimize the HBase (0.89)write 
 path. And based on the simulator results, the following model is able to 
 achieve much higher overall throughput with less threads.
 IPC Writer Threads Pool: 
 IPC handler threads will prepare all Put requests, and append the WALEdit, as 
 one transaction, into a concurrent collection with a read lock. And then just 
 return;
 HLogSyncer Thread:
 Each HLogSyncer thread is corresponding to one HLog stream. It swaps the 
 concurrent collection with a write lock, and then iterate over all the 
 elements in the previous concurrent collection, generate the sequence id for 
 each transaction, and write to HLog. After the HLog sync is done, append 
 these transactions as a batch into a blocking queue. 
 Memstore Update Thread:
 The memstore update thread will poll the blocking queue and update the 
 memstore for each transaction by using the sequence id as MVCC. Once the 
 memstore update is done, dispatch to the responder thread pool to return to 
 the client.
 Responder Thread Pool:
 Responder thread pool will return the RPC call in parallel. 
 We are still evaluating this model and will share more results/numbers once 
 it is ready. But really appreciate any comments in advance !



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10659) [89-fb] Optimize the threading model in HBase write path

2014-03-04 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919818#comment-13919818
 ] 

Liyin Tang commented on HBASE-10659:


1) IPC writer thread will do all the sanity check as a preparation, such as 
figure out which Region and whether it is enabled.
2) IPC writer thread will handoff the Put request, and then start to process 
next IPC request. It won't block or wait for the current Put request to finish. 
The responder thread will finally return the call to the clients. 
3) One HLogSyncer per WAL, and each HLogSyncer has its own concurrent 
collections to swap between.
4)  I don' fully understand your last question. Since the HLogSyncer thread has 
already one the sequencing for each transaction, memstore-update-thread could 
just reuse the same sequence id as MVCC.

The basic motivation of this new write path is to reduce the thread 
interleaving and synchronizations in the critical write path as much as 
possible.

 [89-fb] Optimize the threading model in HBase write path
 

 Key: HBASE-10659
 URL: https://issues.apache.org/jira/browse/HBASE-10659
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang

 Recently, we have done multiple prototypes to optimize the HBase (0.89)write 
 path. And based on the simulator results, the following model is able to 
 achieve much higher overall throughput with less threads.
 IPC Writer Threads Pool: 
 IPC handler threads will prepare all Put requests, and append the WALEdit, as 
 one transaction, into a concurrent collection with a read lock. And then just 
 return;
 HLogSyncer Thread:
 Each HLogSyncer thread is corresponding to one HLog stream. It swaps the 
 concurrent collection with a write lock, and then iterate over all the 
 elements in the previous concurrent collection, generate the sequence id for 
 each transaction, and write to HLog. After the HLog sync is done, append 
 these transactions as a batch into a blocking queue. 
 Memstore Update Thread:
 The memstore update thread will poll the blocking queue and update the 
 memstore for each transaction by using the sequence id as MVCC. Once the 
 memstore update is done, dispatch to the responder thread pool to return to 
 the client.
 Responder Thread Pool:
 Responder thread pool will return the RPC call in parallel. 
 We are still evaluating this model and will share more results/numbers once 
 it is ready. But really appreciate any comments in advance !



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-10659) [89-fb] Optimize the threading model in HBase write path

2014-03-03 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-10659:
--

 Summary: [89-fb] Optimize the threading model in HBase write path
 Key: HBASE-10659
 URL: https://issues.apache.org/jira/browse/HBASE-10659
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang


Recently, we have done multiple prototypes to optimize the HBase (0.89)write 
path. And based on the simulator results, the following model is able to 
achieve much higher overall throughput with less threads.

IPC Writer Threads Pool: 
IPC handler threads will prepare all Put requests, and append the WALEdit, as 
one transaction, into a concurrent collection with a read lock. And then just 
return;

HLogSyncer Thread:
Each HLogSyncer thread is corresponding to one HLog stream. It swaps the 
concurrent collection with a write lock, and then iterate over all the elements 
in the previous concurrent collection, generate the sequence id for each 
transaction, and write to HLog. After the HLog sync is done, append these 
transactions as a batch into a blocking queue. 

Memstore Update Thread:
The memstore update thread will poll the blocking queue and update the memstore 
for each transaction by using the sequence id as MVCC. Once the memstore update 
is done, dispatch to the responder thread pool to return to the client.

Responder Thread Pool:
Responder thread pool will return the RPC call in parallel. 

We are still evaluating this model and will share more results/numbers once it 
is ready. But really appreciate any comments in advance !




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10659) [89-fb] Optimize the threading model in HBase write path

2014-03-03 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919033#comment-13919033
 ] 

Liyin Tang commented on HBASE-10659:


1) Since updating memstore is much faster than HLog syncing, one 
memstore-update-thread seems to be sufficient. Or we can make it configurable 
as each HLogSyncer thread will have a corresponding memstore-update-thread.

2)  The HLogSyncer thread will batch multiple transactions, as a group commit, 
from different IPC writer threads, and then sync this group commit into HLog 
stream. And then, the memstore-update-thread will take this group commit and 
update the corresponding memstore in (sequence id) order.

 [89-fb] Optimize the threading model in HBase write path
 

 Key: HBASE-10659
 URL: https://issues.apache.org/jira/browse/HBASE-10659
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang

 Recently, we have done multiple prototypes to optimize the HBase (0.89)write 
 path. And based on the simulator results, the following model is able to 
 achieve much higher overall throughput with less threads.
 IPC Writer Threads Pool: 
 IPC handler threads will prepare all Put requests, and append the WALEdit, as 
 one transaction, into a concurrent collection with a read lock. And then just 
 return;
 HLogSyncer Thread:
 Each HLogSyncer thread is corresponding to one HLog stream. It swaps the 
 concurrent collection with a write lock, and then iterate over all the 
 elements in the previous concurrent collection, generate the sequence id for 
 each transaction, and write to HLog. After the HLog sync is done, append 
 these transactions as a batch into a blocking queue. 
 Memstore Update Thread:
 The memstore update thread will poll the blocking queue and update the 
 memstore for each transaction by using the sequence id as MVCC. Once the 
 memstore update is done, dispatch to the responder thread pool to return to 
 the client.
 Responder Thread Pool:
 Responder thread pool will return the RPC call in parallel. 
 We are still evaluating this model and will share more results/numbers once 
 it is ready. But really appreciate any comments in advance !



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10578) For the same row key, the KV in the newest StoreFile should be returned

2014-02-20 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907555#comment-13907555
 ] 

Liyin Tang commented on HBASE-10578:


Nice finding !

 For the same row key, the KV in the newest StoreFile should be returned
 ---

 Key: HBASE-10578
 URL: https://issues.apache.org/jira/browse/HBASE-10578
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.89-fb, 0.98.1
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor
 Fix For: 0.89-fb

 Attachments: HBASE-10578.patch


 When multiple scanners have the same KV, HBase should pick the newest one.
 i.e. pick the KV from the store file with the largest seq id.
 In the KeyValueHeap generalizedSeek implementation, we seem to prefer the 
 current
 scanner over the scanners in the heap -- THIS IS WRONG.
 The diff adds a unit test to make sure that bulk loads correctly. And fixes 
 the issue.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.

2014-02-11 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-10502:
--

 Summary: [89-fb] ParallelScanner: a client utility to perform 
multiple scan requests in parallel.
 Key: HBASE-10502
 URL: https://issues.apache.org/jira/browse/HBASE-10502
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
 Fix For: 0.89-fb


ParallelScanner is a utility class for the HBase client to perform multiple 
scan requests in parallel. It requires all the scan requests having the same 
caching size for the simplicity purpose. 
 
This class provides 3 very basic functionalities: 
* The initialize function will Initialize all the ResultScanners by calling 
{@link HTable#getScanner(Scan)} in parallel for each scan request.
* The next function will call the corresponding {@link ResultScanner#next(int 
numRows)} from each scan request in parallel, and then return all the results 
together as a list.  Also, if result list is empty, it indicates there is no 
data left for all the scanners and the user can call {@link #close()} 
afterwards.
* The close function will close all the scanners and shutdown the thread pool.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.

2014-02-11 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898128#comment-13898128
 ] 

Liyin Tang commented on HBASE-10502:


By skimming though HBASE-9272,  the semantics seem to be a little different. In 
this case, the client actually wants to construct multiple scan requests, while 
HBASE-9272 is to perform a single scan request in parallel. 


 [89-fb] ParallelScanner: a client utility to perform multiple scan requests 
 in parallel.
 

 Key: HBASE-10502
 URL: https://issues.apache.org/jira/browse/HBASE-10502
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
 Fix For: 0.89-fb


 ParallelScanner is a utility class for the HBase client to perform multiple 
 scan requests in parallel. It requires all the scan requests having the same 
 caching size for the simplicity purpose. 
  
 This class provides 3 very basic functionalities: 
 * The initialize function will Initialize all the ResultScanners by calling 
 {@link HTable#getScanner(Scan)} in parallel for each scan request.
 * The next function will call the corresponding {@link ResultScanner#next(int 
 numRows)} from each scan request in parallel, and then return all the results 
 together as a list.  Also, if result list is empty, it indicates there is no 
 data left for all the scanners and the user can call {@link #close()} 
 afterwards.
 * The close function will close all the scanners and shutdown the thread pool.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.

2014-02-11 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898132#comment-13898132
 ] 

Liyin Tang commented on HBASE-10502:


Actually HBase-9272 + HBase10502 is quite effective to optimize Join queries. 
Assuming a join query such as Table A joins Table B based on row key / some 
prefix, then HBase-9272 is useful to issue the initial scan in parallel to 
retrieve all the join keys, and then based on join keys, multiple scan queries 
for Table B can be constructed and be submitted in parallel by HBase10502.

 [89-fb] ParallelScanner: a client utility to perform multiple scan requests 
 in parallel.
 

 Key: HBASE-10502
 URL: https://issues.apache.org/jira/browse/HBASE-10502
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
 Fix For: 0.89-fb


 ParallelScanner is a utility class for the HBase client to perform multiple 
 scan requests in parallel. It requires all the scan requests having the same 
 caching size for the simplicity purpose. 
  
 This class provides 3 very basic functionalities: 
 * The initialize function will Initialize all the ResultScanners by calling 
 {@link HTable#getScanner(Scan)} in parallel for each scan request.
 * The next function will call the corresponding {@link ResultScanner#next(int 
 numRows)} from each scan request in parallel, and then return all the results 
 together as a list.  Also, if result list is empty, it indicates there is no 
 data left for all the scanners and the user can call {@link #close()} 
 afterwards.
 * The close function will close all the scanners and shutdown the thread pool.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.

2014-02-11 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898138#comment-13898138
 ] 

Liyin Tang commented on HBASE-10502:


In addition, the API of HBASE-10502 seems to more flexible (to me). Because if 
there is a single scan request, spanning multiple region boundaries, then hbase 
client is always able to split this scan request into multiple region-local 
scan requests, and then submit to HBASE-10502 for parallel execution.


 [89-fb] ParallelScanner: a client utility to perform multiple scan requests 
 in parallel.
 

 Key: HBASE-10502
 URL: https://issues.apache.org/jira/browse/HBASE-10502
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
 Fix For: 0.89-fb


 ParallelScanner is a utility class for the HBase client to perform multiple 
 scan requests in parallel. It requires all the scan requests having the same 
 caching size for the simplicity purpose. 
  
 This class provides 3 very basic functionalities: 
 * The initialize function will Initialize all the ResultScanners by calling 
 {@link HTable#getScanner(Scan)} in parallel for each scan request.
 * The next function will call the corresponding {@link ResultScanner#next(int 
 numRows)} from each scan request in parallel, and then return all the results 
 together as a list.  Also, if result list is empty, it indicates there is no 
 data left for all the scanners and the user can call {@link #close()} 
 afterwards.
 * The close function will close all the scanners and shutdown the thread pool.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10296) Replace ZK with a consensus lib(paxos,zab or raft) running within master processes to provide better master failover performance and state consistency

2014-01-28 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13884979#comment-13884979
 ] 

Liyin Tang commented on HBASE-10296:


Speaking of RAFT implementation, we, the FB hbase team, are very close to open 
source a Raft implementation as a library. And there are multiple potentials to 
integrate Raft protocol into HBase/HDFS software stack.



 Replace ZK with a consensus lib(paxos,zab or raft) running within master 
 processes to provide better master failover performance and state consistency
 --

 Key: HBASE-10296
 URL: https://issues.apache.org/jira/browse/HBASE-10296
 Project: HBase
  Issue Type: Brainstorming
  Components: master, Region Assignment, regionserver
Reporter: Feng Honghua

 Currently master relies on ZK to elect active master, monitor liveness and 
 store almost all of its states, such as region states, table info, 
 replication info and so on. And zk also plays as a channel for 
 master-regionserver communication(such as in region assigning) and 
 client-regionserver communication(such as replication state/behavior change). 
 But zk as a communication channel is fragile due to its one-time watch and 
 asynchronous notification mechanism which together can leads to missed 
 events(hence missed messages), for example the master must rely on the state 
 transition logic's idempotence to maintain the region assigning state 
 machine's correctness, actually almost all of the most tricky inconsistency 
 issues can trace back their root cause to the fragility of zk as a 
 communication channel.
 Replace zk with paxos running within master processes have following benefits:
 1. better master failover performance: all master, either the active or the 
 standby ones, have the same latest states in memory(except lag ones but which 
 can eventually catch up later on). whenever the active master dies, the newly 
 elected active master can immediately play its role without such failover 
 work as building its in-memory states by consulting meta-table and zk.
 2. better state consistency: master's in-memory states are the only truth 
 about the system,which can eliminate inconsistency from the very beginning. 
 and though the states are contained by all masters, paxos guarantees they are 
 identical at any time.
 3. more direct and simple communication pattern: client changes state by 
 sending requests to master, master and regionserver talk directly to each 
 other by sending request and response...all don't bother to using a 
 third-party storage like zk which can introduce more uncertainty, worse 
 latency and more complexity.
 4. zk can only be used as liveness monitoring for determining if a 
 regionserver is dead, and later on we can eliminate zk totally when we build 
 heartbeat between master and regionserver.
 I know this might looks like a very crazy re-architect, but it deserves deep 
 thinking and serious discussion for it, right?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7404) Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE

2014-01-22 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879091#comment-13879091
 ] 

Liyin Tang commented on HBASE-7404:
---

Liang, just curious, what's the top contributor for the p99 latency in your 
case ?

 Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE
 --

 Key: HBASE-7404
 URL: https://issues.apache.org/jira/browse/HBASE-7404
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.94.3
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.95.0

 Attachments: 7404-0.94-fixed-lines.txt, 7404-trunk-v10.patch, 
 7404-trunk-v11.patch, 7404-trunk-v12.patch, 7404-trunk-v13.patch, 
 7404-trunk-v13.txt, 7404-trunk-v14.patch, BucketCache.pdf, 
 HBASE-7404-backport-0.94.patch, Introduction of Bucket Cache.pdf, 
 hbase-7404-94v2.patch, hbase-7404-trunkv2.patch, hbase-7404-trunkv9.patch


 First, thanks @neil from Fusion-IO share the source code.
 Usage:
 1.Use bucket cache as main memory cache, configured as the following:
 –hbase.bucketcache.ioengine heap (or offheap if using offheap memory to 
 cache block )
 –hbase.bucketcache.size 0.4 (size for bucket cache, 0.4 is a percentage of 
 max heap size)
 2.Use bucket cache as a secondary cache, configured as the following:
 –hbase.bucketcache.ioengine file:/disk1/hbase/cache.data(The file path 
 where to store the block data)
 –hbase.bucketcache.size 1024 (size for bucket cache, unit is MB, so 1024 
 means 1GB)
 –hbase.bucketcache.combinedcache.enabled false (default value being true)
 See more configurations from org.apache.hadoop.hbase.io.hfile.CacheConfig and 
 org.apache.hadoop.hbase.io.hfile.bucket.BucketCache
 What's Bucket Cache? 
 It could greatly decrease CMS and heap fragment by GC
 It support a large cache space for High Read Performance by using high speed 
 disk like Fusion-io
 1.An implementation of block cache like LruBlockCache
 2.Self manage blocks' storage position through Bucket Allocator
 3.The cached blocks could be stored in the memory or file system
 4.Bucket Cache could be used as a mainly block cache(see CombinedBlockCache), 
 combined with LruBlockCache to decrease CMS and fragment by GC.
 5.BucketCache also could be used as a secondary cache(e.g. using Fusionio to 
 store block) to enlarge cache space
 How about SlabCache?
 We have studied and test SlabCache first, but the result is bad, because:
 1.SlabCache use SingleSizeCache, its use ratio of memory is low because kinds 
 of block size, especially using DataBlockEncoding
 2.SlabCache is uesd in DoubleBlockCache, block is cached both in SlabCache 
 and LruBlockCache, put the block to LruBlockCache again if hit in SlabCache , 
 it causes CMS and heap fragment don't get any better
 3.Direct heap performance is not good as heap, and maybe cause OOM, so we 
 recommend using heap engine 
 See more in the attachment and in the patch



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10342) RowKey Prefix Bloom Filter

2014-01-15 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872414#comment-13872414
 ] 

Liyin Tang commented on HBASE-10342:


Interesting If the most significant part of the row key is evenly 
distributed across the row key space, usually we don't need to salt the table, 
right ? 

 RowKey Prefix Bloom Filter
 --

 Key: HBASE-10342
 URL: https://issues.apache.org/jira/browse/HBASE-10342
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang

 When designing HBase schema for some use cases, it is quite common to combine 
 multiple information within the RowKey. For instance, assuming that rowkey is 
 constructed as md5(id1) + id1 + id2, and user wants to scan all the rowkeys 
 which starting by id1. In such case, the rowkey bloom filter is able to cut 
 more unnecessary seeks during the scan.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7509) Enable RS to query a secondary datanode in parallel, if the primary takes too long

2014-01-15 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872509#comment-13872509
 ] 

Liyin Tang commented on HBASE-7509:
---

I guess Amitanand probably have a diff for 89-fb, but not for trunk yet. But 
thanks Liang for following up !

 Enable RS to query a secondary datanode in parallel, if the primary takes too 
 long
 --

 Key: HBASE-7509
 URL: https://issues.apache.org/jira/browse/HBASE-7509
 Project: HBase
  Issue Type: Improvement
Reporter: Amitanand Aiyer
Assignee: Liang Xie
Priority: Critical
 Attachments: quorumDiffs.tgz






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10360) [89-fb] Expose the HRegionLocations for each HTable in an efficient way

2014-01-15 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-10360:
--

 Summary: [89-fb] Expose the HRegionLocations for each HTable in an 
efficient way
 Key: HBASE-10360
 URL: https://issues.apache.org/jira/browse/HBASE-10360
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang


HTable.getHRegionInfo() will return all the RegionServer address for each 
HRegion by scanning the META table. Actually, the HConnectionManger could cache 
these data and refresh the client location cache directly. Also, HTable could 
expose another API to return these cached HRegionLocation directly without 
scanning the META table.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7509) Enable RS to query a secondary datanode in parallel, if the primary takes too long

2014-01-15 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873019#comment-13873019
 ] 

Liyin Tang commented on HBASE-7509:
---

[~xieliang007], actually I think it makes more sense to assign the JIRA to you, 
considering you are the one actively working on diff towards the HBase-trunk.

 Enable RS to query a secondary datanode in parallel, if the primary takes too 
 long
 --

 Key: HBASE-7509
 URL: https://issues.apache.org/jira/browse/HBASE-7509
 Project: HBase
  Issue Type: Improvement
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Critical
 Attachments: quorumDiffs.tgz






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10342) RowKey Prefix Bloom Filter

2014-01-14 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-10342:
--

 Summary: RowKey Prefix Bloom Filter
 Key: HBASE-10342
 URL: https://issues.apache.org/jira/browse/HBASE-10342
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang


When designing HBase schema for some use cases, it is quite common to combine 
multiple information within the RowKey. For instance, assuming that rowkey is 
constructed as md5(id1) + id1 + id2, and user wants to scan all the rowkeys 
which starting at id1 . In such case, the rowkey bloom filter is able to cut 
more unnecessary seeks during the scan.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10342) RowKey Prefix Bloom Filter

2014-01-14 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871684#comment-13871684
 ] 

Liyin Tang commented on HBASE-10342:


This feature shall also benefit the Salted Tables as well. 

 RowKey Prefix Bloom Filter
 --

 Key: HBASE-10342
 URL: https://issues.apache.org/jira/browse/HBASE-10342
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang

 When designing HBase schema for some use cases, it is quite common to combine 
 multiple information within the RowKey. For instance, assuming that rowkey is 
 constructed as md5(id1) + id1 + id2, and user wants to scan all the rowkeys 
 which starting at id1 . In such case, the rowkey bloom filter is able to cut 
 more unnecessary seeks during the scan.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10343) Write the last sequence id into the HLog during the RegionOpen time

2014-01-14 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-10343:
--

 Summary: Write the last sequence id into the HLog during the 
RegionOpen time
 Key: HBASE-10343
 URL: https://issues.apache.org/jira/browse/HBASE-10343
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang


HLog based async replication has a challenging to guarantee the in-order 
delivery when the Region is moving from one HLog stream to another HLog stream. 

One approach is to keep the last_sequence_id in the new HLog stream when 
opening the Region. So the replication framework is able to catch upto the 
last_sequence_id from the previous HLog stream, before replicating any new 
transactions through the new HLog stream.





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10275) [89-fb] Guarantee the sequenceID in each Region is strictly monotonic increasing

2014-01-14 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871719#comment-13871719
 ] 

Liyin Tang commented on HBASE-10275:


HBASE-10343 might resolve this issue in much easier way.

 [89-fb] Guarantee the sequenceID in each Region is strictly monotonic 
 increasing
 

 Key: HBASE-10275
 URL: https://issues.apache.org/jira/browse/HBASE-10275
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Liyin Tang

 [HBASE-8741] has implemented the per-region sequence ID. It would be even 
 better to guarantee that the sequencing is strictly monotonic increasing so 
 that HLog-Based Async Replication is able to delivery transactions in order 
 in the case of region movements.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10342) RowKey Prefix Bloom Filter

2014-01-14 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871725#comment-13871725
 ] 

Liyin Tang commented on HBASE-10342:


Yes, a prefix-hash memstore will help this case as well ! It is definitely 
worth benchmarking.
 

 RowKey Prefix Bloom Filter
 --

 Key: HBASE-10342
 URL: https://issues.apache.org/jira/browse/HBASE-10342
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang

 When designing HBase schema for some use cases, it is quite common to combine 
 multiple information within the RowKey. For instance, assuming that rowkey is 
 constructed as md5(id1) + id1 + id2, and user wants to scan all the rowkeys 
 which starting at id1 . In such case, the rowkey bloom filter is able to cut 
 more unnecessary seeks during the scan.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10342) RowKey Prefix Bloom Filter

2014-01-14 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-10342:
---

Description: When designing HBase schema for some use cases, it is quite 
common to combine multiple information within the RowKey. For instance, 
assuming that rowkey is constructed as md5(id1) + id1 + id2, and user wants to 
scan all the rowkeys which starting by id1. In such case, the rowkey bloom 
filter is able to cut more unnecessary seeks during the scan.  (was: When 
designing HBase schema for some use cases, it is quite common to combine 
multiple information within the RowKey. For instance, assuming that rowkey is 
constructed as md5(id1) + id1 + id2, and user wants to scan all the rowkeys 
which starting at id1 . In such case, the rowkey bloom filter is able to cut 
more unnecessary seeks during the scan.)

 RowKey Prefix Bloom Filter
 --

 Key: HBASE-10342
 URL: https://issues.apache.org/jira/browse/HBASE-10342
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang

 When designing HBase schema for some use cases, it is quite common to combine 
 multiple information within the RowKey. For instance, assuming that rowkey is 
 constructed as md5(id1) + id1 + id2, and user wants to scan all the rowkeys 
 which starting by id1. In such case, the rowkey bloom filter is able to cut 
 more unnecessary seeks during the scan.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-8741) Scope sequenceid to the region rather than regionserver (WAS: Mutations on Regions in recovery mode might have same sequenceIDs)

2014-01-03 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861765#comment-13861765
 ] 

Liyin Tang commented on HBASE-8741:
---

It seems like the diff is using the following hashmap for the regionName to 
last sequence mapping. The java doc mentioned It works in our use case as we 
use {@link HRegionInfo#getEncodedNameAsBytes()} as keys. For a given region, it 
always returns the same array. 

private Mapbyte[], Long latestSequenceNums = new HashMapbyte[], Long(); 

However, if a region has been re-opened before the HLog rolls, then there will 
be 2 entries for the same region in this mapping because the hashcode for these 
2 byte[] will be different, right ?   

 Scope sequenceid to the region rather than regionserver (WAS: Mutations on 
 Regions in recovery mode might have same sequenceIDs)
 

 Key: HBASE-8741
 URL: https://issues.apache.org/jira/browse/HBASE-8741
 Project: HBase
  Issue Type: Bug
  Components: MTTR
Affects Versions: 0.95.1
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.98.0

 Attachments: HBASE-8741-trunk-v6.1-rebased.patch, 
 HBASE-8741-trunk-v6.2.1.patch, HBASE-8741-trunk-v6.2.2.patch, 
 HBASE-8741-trunk-v6.2.2.patch, HBASE-8741-trunk-v6.3.patch, 
 HBASE-8741-trunk-v6.4.patch, HBASE-8741-trunk-v6.patch, HBASE-8741-v0.patch, 
 HBASE-8741-v2.patch, HBASE-8741-v3.patch, HBASE-8741-v4-again.patch, 
 HBASE-8741-v4-again.patch, HBASE-8741-v4.patch, HBASE-8741-v5-again.patch, 
 HBASE-8741-v5.patch


 Currently, when opening a region, we find the maximum sequence ID from all 
 its HFiles and then set the LogSequenceId of the log (in case the later is at 
 a small value). This works good in recovered.edits case as we are not writing 
 to the region until we have replayed all of its previous edits. 
 With distributed log replay, if we want to enable writes while a region is 
 under recovery, we need to make sure that the logSequenceId  maximum 
 logSequenceId of the old regionserver. Otherwise, we might have a situation 
 where new edits have same (or smaller) sequenceIds. 
 We can store region level information in the WALTrailer, than this scenario 
 could be avoided by:
 a) reading the trailer of the last completed file, i.e., last wal file 
 which has a trailer and,
 b) completely reading the last wal file (this file would not have the 
 trailer, so it needs to be read completely).
 In future, if we switch to multi wal file, we could read the trailer for all 
 completed WAL files, and reading the remaining incomplete files.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10275) [89-fb] Guarantee the sequenceID in each Region is strictly monotonic increasing

2014-01-03 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-10275:
--

 Summary: [89-fb] Guarantee the sequenceID in each Region is 
strictly monotonic increasing
 Key: HBASE-10275
 URL: https://issues.apache.org/jira/browse/HBASE-10275
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Liyin Tang


[HBASE-8741] has implemented the per-region sequence ID. It would be even 
better to guarantee that the sequencing is strictly monotonic increasing so 
that HLog-Based Async Replication is able to delivery transactions in order in 
the case of region movements.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10275) [89-fb] Guarantee the sequenceID in each Region is strictly monotonic increasing

2014-01-03 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862208#comment-13862208
 ] 

Liyin Tang commented on HBASE-10275:


The problem you have described is exactly what we want to resolve. Basically if 
the sequenceID for each region is strictly monotonic increasing, then in the 
case of a region moving from A to B, the replication stream in B would know the 
gap/lag for that region in the previous replication stream A. 

As you mentioned but slightly different:  The fix is to guarantee the old hlog 
entries of a region from the previous region server been fully replicated, 
before starting to replicate this region from a new region server.

 [89-fb] Guarantee the sequenceID in each Region is strictly monotonic 
 increasing
 

 Key: HBASE-10275
 URL: https://issues.apache.org/jira/browse/HBASE-10275
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Liyin Tang

 [HBASE-8741] has implemented the per-region sequence ID. It would be even 
 better to guarantee that the sequencing is strictly monotonic increasing so 
 that HLog-Based Async Replication is able to delivery transactions in order 
 in the case of region movements.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-8741) Scope sequenceid to the region rather than regionserver (WAS: Mutations on Regions in recovery mode might have same sequenceIDs)

2014-01-03 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862218#comment-13862218
 ] 

Liyin Tang commented on HBASE-8741:
---

Good to know :) Not sure whether it is worth amending the above implication in 
the java doc. Basically, the latestSequenceNums might contain duplicated keys 
for the same region. In 89-fb, we just use the ConcurrentSkipListMap, just in 
case this map might be reused for other purpose.

Anyway, thanks for the explanation ! Nice feature indeed !


 Scope sequenceid to the region rather than regionserver (WAS: Mutations on 
 Regions in recovery mode might have same sequenceIDs)
 

 Key: HBASE-8741
 URL: https://issues.apache.org/jira/browse/HBASE-8741
 Project: HBase
  Issue Type: Bug
  Components: MTTR
Affects Versions: 0.95.1
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.98.0

 Attachments: HBASE-8741-trunk-v6.1-rebased.patch, 
 HBASE-8741-trunk-v6.2.1.patch, HBASE-8741-trunk-v6.2.2.patch, 
 HBASE-8741-trunk-v6.2.2.patch, HBASE-8741-trunk-v6.3.patch, 
 HBASE-8741-trunk-v6.4.patch, HBASE-8741-trunk-v6.patch, HBASE-8741-v0.patch, 
 HBASE-8741-v2.patch, HBASE-8741-v3.patch, HBASE-8741-v4-again.patch, 
 HBASE-8741-v4-again.patch, HBASE-8741-v4.patch, HBASE-8741-v5-again.patch, 
 HBASE-8741-v5.patch


 Currently, when opening a region, we find the maximum sequence ID from all 
 its HFiles and then set the LogSequenceId of the log (in case the later is at 
 a small value). This works good in recovered.edits case as we are not writing 
 to the region until we have replayed all of its previous edits. 
 With distributed log replay, if we want to enable writes while a region is 
 under recovery, we need to make sure that the logSequenceId  maximum 
 logSequenceId of the old regionserver. Otherwise, we might have a situation 
 where new edits have same (or smaller) sequenceIds. 
 We can store region level information in the WALTrailer, than this scenario 
 could be avoided by:
 a) reading the trailer of the last completed file, i.e., last wal file 
 which has a trailer and,
 b) completely reading the last wal file (this file would not have the 
 trailer, so it needs to be read completely).
 In future, if we switch to multi wal file, we could read the trailer for all 
 completed WAL files, and reading the remaining incomplete files.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId

2013-12-27 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857630#comment-13857630
 ] 

Liyin Tang commented on HBASE-8763:
---

I totally vote for combining the MVCC and SeqID. Furthermore, it will be even 
more straightforward if the SeqID does not shared across all the Regions. 
Ideally, each region shall have its own monotonously increasing seq id. 

 [BRAINSTORM] Combine MVCC and SeqId
 ---

 Key: HBASE-8763
 URL: https://issues.apache.org/jira/browse/HBASE-8763
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Enis Soztutar
Priority: Critical
 Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch


 HBASE-8701 and a lot of recent issues include good discussions about mvcc + 
 seqId semantics. It seems that having mvcc and the seqId complicates the 
 comparator semantics a lot in regards to flush + WAL replay + compactions + 
 delete markers and out of order puts. 
 Thinking more about it I don't think we need a MVCC write number which is 
 different than the seqId. We can keep the MVCC semantics, read point and 
 smallest read points intact, but combine mvcc write number and seqId. This 
 will allow cleaner semantics + implementation + smaller data files. 
 We can do some brainstorming for 0.98. We still have to verify that this 
 would be semantically correct, it should be so by my current understanding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId

2013-12-27 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857680#comment-13857680
 ] 

Liyin Tang commented on HBASE-8763:
---

Thanks for jira ! 
If SeqID has already been per-region basis, and we want to combine the MVCC, 
then how do we want to handle the group commit across multiple regions ? 

 [BRAINSTORM] Combine MVCC and SeqId
 ---

 Key: HBASE-8763
 URL: https://issues.apache.org/jira/browse/HBASE-8763
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Enis Soztutar
Priority: Critical
 Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch


 HBASE-8701 and a lot of recent issues include good discussions about mvcc + 
 seqId semantics. It seems that having mvcc and the seqId complicates the 
 comparator semantics a lot in regards to flush + WAL replay + compactions + 
 delete markers and out of order puts. 
 Thinking more about it I don't think we need a MVCC write number which is 
 different than the seqId. We can keep the MVCC semantics, read point and 
 smallest read points intact, but combine mvcc write number and seqId. This 
 will allow cleaner semantics + implementation + smaller data files. 
 We can do some brainstorming for 0.98. We still have to verify that this 
 would be semantically correct, it should be so by my current understanding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId

2013-12-27 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857820#comment-13857820
 ] 

Liyin Tang commented on HBASE-8763:
---

[~jeffreyz], I see. Thanks for the clarification and it makes sense to me now !

 [BRAINSTORM] Combine MVCC and SeqId
 ---

 Key: HBASE-8763
 URL: https://issues.apache.org/jira/browse/HBASE-8763
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Enis Soztutar
Priority: Critical
 Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch


 HBASE-8701 and a lot of recent issues include good discussions about mvcc + 
 seqId semantics. It seems that having mvcc and the seqId complicates the 
 comparator semantics a lot in regards to flush + WAL replay + compactions + 
 delete markers and out of order puts. 
 Thinking more about it I don't think we need a MVCC write number which is 
 different than the seqId. We can keep the MVCC semantics, read point and 
 smallest read points intact, but combine mvcc write number and seqId. This 
 will allow cleaner semantics + implementation + smaller data files. 
 We can do some brainstorming for 0.98. We still have to verify that this 
 would be semantically correct, it should be so by my current understanding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10083) [89-fb] Better error handling for the compound bloom filter

2013-12-04 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-10083:
--

 Summary: [89-fb] Better error handling for the compound bloom 
filter 
 Key: HBASE-10083
 URL: https://issues.apache.org/jira/browse/HBASE-10083
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Liyin Tang
Assignee: Liyin Tang


When RegionServer failed to load a bloom block from HDFS due to any timeout or 
other reasons, it threw out the exception and disable the entire bloom filter 
for this HFile. This behavior does not make too much sense, especially for the 
compound bloom filter. 

Instead of disabling the bloom filter for the entire file, it could just return 
a potentially false positive result (true) and keep the bloom filter available.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-10009) Fix potential OOM exception in HTableMultiplexer

2013-11-19 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-10009:
--

 Summary: Fix potential OOM exception in HTableMultiplexer
 Key: HBASE-10009
 URL: https://issues.apache.org/jira/browse/HBASE-10009
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb
Reporter: Liyin Tang
Assignee: Manukranth Kolloju
Priority: Minor


HTableMultiplexer is our thread safe non-blocking api. 
HTableMultiplexer.getHTable is supposed to cache HTable instance, but it fails 
to do so if it is called on a table with same name but a different reference. 
Fix this behavior and add a unit test case in the existing 
TestHtableMultiplexer class.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9969) Improve KeyValueHeap using loser tree

2013-11-18 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825519#comment-13825519
 ] 

Liyin Tang commented on HBASE-9969:
---

That's a very promising idea ! Will take a closer look. 
Nice work [~stepinto] !

 Improve KeyValueHeap using loser tree
 -

 Key: HBASE-9969
 URL: https://issues.apache.org/jira/browse/HBASE-9969
 Project: HBase
  Issue Type: Improvement
  Components: Performance, regionserver
Reporter: Chao Shi
Assignee: Chao Shi
 Fix For: 0.98.0, 0.96.1, 0.94.15

 Attachments: 9969-0.94.txt, hbase-9969-v2.patch, hbase-9969-v3.patch, 
 hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt


 LoserTree is the better data structure than binary heap. It saves half of the 
 comparisons on each next(), though the time complexity is on O(logN).
 Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
 read from multiple HFiles in a single store, the other is merging results 
 from multiple stores. This patch should improve the both cases whenever CPU 
 is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
 All of the optimization work is done in KeyValueHeap and does not change its 
 public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9102) HFile block pre-loading for large sequential scan

2013-08-01 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726635#comment-13726635
 ] 

Liyin Tang commented on HBASE-9102:
---

Chao, You are right that the pre-load will run in a rate/limit fashion to make 
sure it won't pollute the block cache substantially.
The pre-loading targets on the large sequential scan case. The client is able 
to enable/disable on each request basis. 


 HFile block pre-loading for large sequential scan
 -

 Key: HBASE-9102
 URL: https://issues.apache.org/jira/browse/HBASE-9102
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Liyin Tang
Assignee: Liyin Tang

 The current HBase scan model cannot take full advantage of the aggrediate 
 disk throughput, especially for the large sequential scan cases. And for the 
 large sequential scan, it is easy to predict what the next block to read in 
 advance so that it can pre-load and decompress/decoded these data blocks from 
 HDFS into block cache right before the current read point. 
 Therefore, this jira is to optimized the large sequential scan performance by 
 pre-loading the HFile blocks into the block cache in a stream fashion so that 
 the scan query can read from the cache directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-9102) HFile block pre-loading for large sequential scan

2013-07-31 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-9102:
-

 Summary: HFile block pre-loading for large sequential scan
 Key: HBASE-9102
 URL: https://issues.apache.org/jira/browse/HBASE-9102
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Liyin Tang
Assignee: Liyin Tang


The current HBase scan model cannot take full advantage of the aggrediate disk 
throughput, especially for the large sequential scan cases. And for the large 
sequential scan, it is easy to predict what the next block to read in advance 
so that it can pre-load and decompress/decoded these data blocks from HDFS into 
block cache right before the current read point. 

Therefore, this jira is to optimized the large sequential scan performance by 
pre-loading the HFile blocks into the block cache in a stream fashion so that 
the scan query can read from the cache directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7266) [89-fb] Using pread for non-compaction read request

2013-07-31 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725443#comment-13725443
 ] 

Liyin Tang commented on HBASE-7266:
---

Chao,we have switched all the read operation to pread in the 89-fb branch. 
There are 2 followup tasks for the pread. 1) The DFSClient maintains a 
connection pool instead of creating new connection for each pread operation. 2) 
HBase will actively pre-load the next several blocks in a stream fashion for 
large sequential scans (HBASE-9102)

 [89-fb] Using pread for non-compaction read request
 ---

 Key: HBASE-7266
 URL: https://issues.apache.org/jira/browse/HBASE-7266
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang

 There are 2 kinds of read operations in HBase: pread and seek+read.
 Pread, positional read, is stateless and create a new connection between the 
 DFSClient and DataNode for each operation. While seek+read is to seek to a 
 specific postion and prefetch blocks from data nodes. The benefit of 
 seek+read is that it will cache the prefetch result but the downside is it is 
 stateful and needs to synchronized.
 So far, both compaction and scan are using seek+read, which caused some 
 resource contention. So using the pread for the scan request can avoid the 
 resource contention. In addition, the region server is able to do the 
 prefetch for the scan request (HBASE-6874) so that it won't be necessary to 
 let the DFSClient to prefetch the data any more.
 I will run through the scan benchmark (with no block cache) with verify the 
 performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9102) HFile block pre-loading for large sequential scan

2013-07-31 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725515#comment-13725515
 ] 

Liyin Tang commented on HBASE-9102:
---

It is true that OS cached the compressed/encoded blocks and the DFSClient 
non-pread operation is also able to pre-load all the bytes up to that DFS 
block. And this feature is to pre-load (decompress/decoded) these data blocks 
in additional to the OS cache/disk read-ahead.

Also the scan prefetch is currently implemented in the RegionScanner level. I 
think it is a good idea to implement some prefetch logic in the HBase client as 
well.

 HFile block pre-loading for large sequential scan
 -

 Key: HBASE-9102
 URL: https://issues.apache.org/jira/browse/HBASE-9102
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Liyin Tang
Assignee: Liyin Tang

 The current HBase scan model cannot take full advantage of the aggrediate 
 disk throughput, especially for the large sequential scan cases. And for the 
 large sequential scan, it is easy to predict what the next block to read in 
 advance so that it can pre-load and decompress/decoded these data blocks from 
 HDFS into block cache right before the current read point. 
 Therefore, this jira is to optimized the large sequential scan performance by 
 pre-loading the HFile blocks into the block cache in a stream fashion so that 
 the scan query can read from the cache directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6930) [89-fb] Avoid acquiring the same row lock repeatedly

2013-06-27 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-6930:
--

Attachment: (was: D5841.2.patch)

 [89-fb] Avoid acquiring the same row lock repeatedly
 

 Key: HBASE-6930
 URL: https://issues.apache.org/jira/browse/HBASE-6930
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang
 Attachments: HBASE-6930.diff


 When processing the multiPut, multiMutations or multiDelete operations, each 
 IPC handler thread tries to acquire a lock for each row key in these batches. 
 If there are duplicated row keys in these batches, previously the IPC handler 
 thread will repeatedly acquire the same row key again and again.
 So the optimization is to sort each batch operation based on the row key in 
 the client side, and skip acquiring the same row lock repeatedly in the 
 server side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6930) [89-fb] Avoid acquiring the same row lock repeatedly

2013-06-27 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-6930:
--

Attachment: HBASE-6930.diff

 [89-fb] Avoid acquiring the same row lock repeatedly
 

 Key: HBASE-6930
 URL: https://issues.apache.org/jira/browse/HBASE-6930
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang
 Attachments: HBASE-6930.diff


 When processing the multiPut, multiMutations or multiDelete operations, each 
 IPC handler thread tries to acquire a lock for each row key in these batches. 
 If there are duplicated row keys in these batches, previously the IPC handler 
 thread will repeatedly acquire the same row key again and again.
 So the optimization is to sort each batch operation based on the row key in 
 the client side, and skip acquiring the same row lock repeatedly in the 
 server side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6930) [89-fb] Avoid acquiring the same row lock repeatedly

2013-06-27 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694916#comment-13694916
 ] 

Liyin Tang commented on HBASE-6930:
---

The patch here seems to be attached to a wrong jira. Sorry about the confusion.
I have re-attached the path here.

 [89-fb] Avoid acquiring the same row lock repeatedly
 

 Key: HBASE-6930
 URL: https://issues.apache.org/jira/browse/HBASE-6930
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang
 Attachments: HBASE-6930.diff


 When processing the multiPut, multiMutations or multiDelete operations, each 
 IPC handler thread tries to acquire a lock for each row key in these batches. 
 If there are duplicated row keys in these batches, previously the IPC handler 
 thread will repeatedly acquire the same row key again and again.
 So the optimization is to sort each batch operation based on the row key in 
 the client side, and skip acquiring the same row lock repeatedly in the 
 server side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6930) [89-fb] Avoid acquiring the same row lock repeatedly

2013-06-27 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-6930:
--

Attachment: (was: D5841.1.patch)

 [89-fb] Avoid acquiring the same row lock repeatedly
 

 Key: HBASE-6930
 URL: https://issues.apache.org/jira/browse/HBASE-6930
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang
 Attachments: HBASE-6930.diff


 When processing the multiPut, multiMutations or multiDelete operations, each 
 IPC handler thread tries to acquire a lock for each row key in these batches. 
 If there are duplicated row keys in these batches, previously the IPC handler 
 thread will repeatedly acquire the same row key again and again.
 So the optimization is to sort each batch operation based on the row key in 
 the client side, and skip acquiring the same row lock repeatedly in the 
 server side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8806) Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for duplicate rows.

2013-06-26 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694487#comment-13694487
 ] 

Liyin Tang commented on HBASE-8806:
---

Is this issue similar as HBASE-6930 ? We solved the problem by sorting the rows 
for each multiput batch in the client side. 

 Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for 
 duplicate rows.
 

 Key: HBASE-8806
 URL: https://issues.apache.org/jira/browse/HBASE-8806
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.5
Reporter: rahul gidwani
 Fix For: 0.95.2, 0.94.10

 Attachments: HBASE-8806-0.94.10.patch, HBASE-8806-0.94.10-v2.patch


 If we already have the lock in the doMiniBatchMutation we don't need to 
 re-acquire it. The solution would be to keep a cache of the rowKeys already 
 locked for a miniBatchMutation and If we already have the 
 rowKey in the cache, we don't repeatedly try and acquire the lock.  A fix to 
 this problem would be to keep a set of rows we already locked and not try to 
 acquire the lock for these rows.  
 We have tested this fix in our production environment and has improved 
 replication performance quite a bit.  We saw a replication batch go from 3+ 
 minutes to less than 10 seconds for batches with duplicate row keys.
 {code}
 static final int ACQUIRE_LOCK_COUNT = 0;
   @Test
   public void testRedundantRowKeys() throws Exception {
 final int batchSize = 10;
 
 String tableName = getClass().getSimpleName();
 Configuration conf = HBaseConfiguration.create();
 conf.setClass(HConstants.REGION_IMPL, MockHRegion.class, HeapSize.class);
 MockHRegion region = (MockHRegion) 
 TestHRegion.initHRegion(Bytes.toBytes(tableName), tableName, conf, 
 Bytes.toBytes(a));
 ListPairMutation, Integer someBatch = Lists.newArrayList();
 int i = 0;
 while (i  batchSize) {
   if (i % 2 == 0) {
 someBatch.add(new PairMutation, Integer(new Put(Bytes.toBytes(0)), 
 null));
   } else {
 someBatch.add(new PairMutation, Integer(new Put(Bytes.toBytes(1)), 
 null));
   }
   i++;
 }
 long startTime = System.currentTimeMillis();
 region.batchMutate(someBatch.toArray(new Pair[0]));
 long endTime = System.currentTimeMillis();
 long duration = endTime - startTime;
 System.out.println(duration:  + duration +  ms);
 assertEquals(2, ACQUIRE_LOCK_COUNT);
   }
   @Override
   public Integer getLock(Integer lockid, byte[] row, boolean waitForLock) 
 throws IOException {
 ACQUIRE_LOCK_COUNT++;
 return super.getLock(lockid, row, waitForLock);
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7055) port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)

2013-06-18 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-7055:
--

Attachment: Tier Based Compaction Settings.pdf

The tier based compaction settings.

 port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)
 --

 Key: HBASE-7055
 URL: https://issues.apache.org/jira/browse/HBASE-7055
 Project: HBase
  Issue Type: Task
  Components: Compaction
Affects Versions: 0.95.2
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.95.2

 Attachments: HBASE-6371-squashed.patch, HBASE-6371-v2-squashed.patch, 
 HBASE-6371-v3-refactor-only-squashed.patch, 
 HBASE-6371-v4-refactor-only-squashed.patch, 
 HBASE-6371-v5-refactor-only-squashed.patch, HBASE-7055-v0.patch, 
 HBASE-7055-v1.patch, HBASE-7055-v2.patch, HBASE-7055-v3.patch, 
 HBASE-7055-v4.patch, HBASE-7055-v5.patch, HBASE-7055-v6.patch, 
 HBASE-7055-v7.patch, HBASE-7055-v7.patch, Tier Based Compaction Settings.pdf


 See HBASE-6371 for details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7055) port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)

2013-06-18 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687476#comment-13687476
 ] 

Liyin Tang commented on HBASE-7055:
---

Unfortunately, we haven't config this tier based compaction for our 
applications, either.


 port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)
 --

 Key: HBASE-7055
 URL: https://issues.apache.org/jira/browse/HBASE-7055
 Project: HBase
  Issue Type: Task
  Components: Compaction
Affects Versions: 0.95.2
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.95.2

 Attachments: HBASE-6371-squashed.patch, HBASE-6371-v2-squashed.patch, 
 HBASE-6371-v3-refactor-only-squashed.patch, 
 HBASE-6371-v4-refactor-only-squashed.patch, 
 HBASE-6371-v5-refactor-only-squashed.patch, HBASE-7055-v0.patch, 
 HBASE-7055-v1.patch, HBASE-7055-v2.patch, HBASE-7055-v3.patch, 
 HBASE-7055-v4.patch, HBASE-7055-v5.patch, HBASE-7055-v6.patch, 
 HBASE-7055-v7.patch, HBASE-7055-v7.patch, Tier Based Compaction Settings.pdf


 See HBASE-6371 for details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-5263) Preserving cached data on compactions through cache-on-write

2013-04-11 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang reassigned HBASE-5263:
-

Assignee: Rishit Shroff  (was: Mikhail Bautin)

 Preserving cached data on compactions through cache-on-write
 

 Key: HBASE-5263
 URL: https://issues.apache.org/jira/browse/HBASE-5263
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Rishit Shroff
Priority: Minor

 We are tackling HBASE-3976 and HBASE-5230 to make sure we don't trash the 
 block cache on compactions if cache-on-write is enabled. However, it would be 
 ideal to reduce the effect compactions have on the cached data. For every 
 block we are writing for a compacted file we can decide whether it needs to 
 be cached based on whether the original blocks containing the same data were 
 already in cache. More precisely, for every HFile reader in a compaction we 
 can maintain a boolean flag saying whether the current key-value came from a 
 disk IO or the block cache. In the HFile writer for the compaction's output 
 we can maintain a flag that is set if any of the key-values in the block 
 being written came from a cached block, use that flag at the end of a block 
 to decide whether to cache-on-write the block, and reset the flag to false on 
 a block boundary. If such an inclusive approach would still trash the cache, 
 we could restrict the total number of blocks to be cached per an output 
 HFile, switch to an and logic instead of or logic for deciding whether to 
 cache an output file block, or only cache a certain percentage of output file 
 blocks that contain some of the previously cached data. 
 Thanks to Nicolas for this elegant online algorithm idea!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row

2013-02-28 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589763#comment-13589763
 ] 

Liyin Tang commented on HBASE-4433:
---

Hi Lars, the jira Kannan mentioned is [HBASE-5987] HFileBlockIndex 
improvements. By looking ahead at the next indexed key, HBase internal reader 
knows whether to keep scanning the current DataBlock or look up the index. This 
feature avoids additional index lookup overhead when multiple requests are 
sequentially scanning the HFile data block.

Actually, we have a list of jiras in our FB internal HBase release. Do you know 
a proper place we could share these work with more hbase-dev ?

 avoid extra next (potentially a seek) if done with column/row
 -

 Key: HBASE-4433
 URL: https://issues.apache.org/jira/browse/HBASE-4433
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan
 Fix For: 0.92.0


 [Noticed this in 89, but quite likely true of trunk as well.]
 When we are done with the requested column(s) the code still does an extra 
 next() call before it realizes that it is actually done. This extra next() 
 call could potentially result in an unnecessary extra block load. This is 
 likely to be especially bad for CFs where the KVs are large blobs where each 
 KV may be occupying a block of its own. So the next() can often load a new 
 unrelated block unnecessarily.
 --
 For the simple case of reading say the top-most column in a row in a single 
 file, where each column (KV) was say a block of its own-- it seems that we 
 are reading 3 blocks, instead of 1 block!
 I am working on a simple patch and with that the number of seeks is down to 
 2. 
 [There is still an extra seek left.  I think there were two levels of 
 extra/unnecessary next() we were doing without actually confirming that the 
 next was needed. One at the StoreScanner/ScanQueryMatcher level which this 
 diff avoids. I think the other is at hfs.next() (at the storefile scanner 
 level) that's happening whenever a HFile scanner servers out a data-- and 
 perhaps that's the additional seek that we need to avoid. But I want to 
 tackle this optimization first as the two issues seem unrelated.]
 -- 
 The basic idea of the patch I am working on/testing is as follows. The 
 ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if 
 the KV needs to be included and then if done, only in the the next call it 
 returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases 
 when ExplicitColumnTracker knows it is done with a particular column/row, the 
 patch attempts to combine the INCLUDE code and done hint into a single match 
 code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7444) [89-fb] Update the default user name in MiniHBaseCluster

2012-12-27 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-7444:
-

 Summary: [89-fb] Update the default user name in MiniHBaseCluster
 Key: HBASE-7444
 URL: https://issues.apache.org/jira/browse/HBASE-7444
 Project: HBase
  Issue Type: Test
Reporter: Liyin Tang
Priority: Minor


Currently we are using $username.hrs.$index as default user name in 
MiniHBaseCluster, which actually is not a legal user name. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7416) [89-fb] Tier compaction with fixed boundary option

2012-12-20 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-7416:
-

 Summary:  [89-fb] Tier compaction with fixed boundary option
 Key: HBASE-7416
 URL: https://issues.apache.org/jira/browse/HBASE-7416
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Assignee: Chen Jin


Currently, in tier compaction the aged-based algorithm considers HFile's age in 
disk relative to the current time, thus the tiers are actually shifting along 
the time. In order to best use our prior information about how applications 
consume the data, it needs another feature to perceive the tiers relative to a 
fixed time point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7416) [89-fb] Tier compaction with fixed boundary option

2012-12-20 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-7416:
--

Assignee: (was: Chen Jin)

  [89-fb] Tier compaction with fixed boundary option
 ---

 Key: HBASE-7416
 URL: https://issues.apache.org/jira/browse/HBASE-7416
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang

 Currently, in tier compaction the aged-based algorithm considers HFile's age 
 in disk relative to the current time, thus the tiers are actually shifting 
 along the time. In order to best use our prior information about how 
 applications consume the data, it needs another feature to perceive the tiers 
 relative to a fixed time point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-5776) HTableMultiplexer

2012-12-20 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang reassigned HBASE-5776:
-

Assignee: binlijin  (was: Liyin Tang)

 HTableMultiplexer 
 --

 Key: HBASE-5776
 URL: https://issues.apache.org/jira/browse/HBASE-5776
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: binlijin
 Attachments: ASF.LICENSE.NOT.GRANTED--D2775.1.patch, 
 ASF.LICENSE.NOT.GRANTED--D2775.1.patch, 
 ASF.LICENSE.NOT.GRANTED--D2775.2.patch, 
 ASF.LICENSE.NOT.GRANTED--D2775.2.patch, 
 ASF.LICENSE.NOT.GRANTED--D2775.3.patch, 
 ASF.LICENSE.NOT.GRANTED--D2775.4.patch, 
 ASF.LICENSE.NOT.GRANTED--D2775.5.patch, HBASE-5776-trunk.patch, 
 HBASE-5776-trunk-V2.patch


 There is a known issue in HBase client that single slow/dead region server 
 could slow down the multiput operations across all the region servers. So the 
 HBase client will be as slow as the slowest region server in the cluster. 
  
 To solve this problem, HTableMultiplexer will separate the multiput 
 submitting threads with the flush threads, which means the multiput operation 
 will be a nonblocking operation. 
 The submitting thread will shard all the puts into different queues based on 
 its destination region server and return immediately. The flush threads will 
 flush these puts from each queue to its destination region server. 
 Currently the HTableMultiplexer only supports the put operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7348) [89-fb] Add some statistics from DFSClient to RegionServerMetrics

2012-12-13 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-7348:
-

 Summary: [89-fb] Add some statistics from DFSClient to 
RegionServerMetrics
 Key: HBASE-7348
 URL: https://issues.apache.org/jira/browse/HBASE-7348
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang


DFSClient actually collected a number of useful statistics such as 
bytesLocalRead, bytesLocalRackRead and so on. So this diff is going to merge 
these metrics into the RegionServerMetrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7275) [89-fb] Fixing some minor bugs in 89-fb branch

2012-12-04 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-7275:
-

 Summary: [89-fb] Fixing some minor bugs in 89-fb branch
 Key: HBASE-7275
 URL: https://issues.apache.org/jira/browse/HBASE-7275
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang
Priority: Minor


[89-fb] Fixing some minor bugs in 89-fb branch based on the findBugs report

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7267) [89-fb] Only create the dummy hfile for the compaction if necessary.

2012-12-04 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-7267:
--

Description: 
In HBASE-6059, it introduced a new behavior that the compaction would create 
the HFileWriter no mater whether there is any key/value as the output or not. 
This new behavior actually is conflicts with HBASE-5199 (Delete out of TTL 
store files before compaction selection) so that compacting the expired hfiles 
would generate one more expired hfiles.

Actually we only needs to create the dummy hfile IFF the maxSequenceID among 
the compaction candidates is equal to the maxSequenceID among all the on-disk 
hfiles. 

  was:
In HBASE-6059, it introduced a new behavior that the compaction would create 
the HFileWriter no mater whether there is any key/value as the output or not. 
This new behavior actually is conflicts with HBASE-5199 (Delete out of TTL 
store files before compaction selection) so that compacting the expired hfiles 
would generate one more expired hfiles.

Actually we only needs to create the dummy hfile IFF the maxSequenceID among 
the compaction candidates is equal to the the maxSequenceID among all the 
on-disk hfiles. 


 [89-fb] Only create the dummy hfile for the compaction if necessary.
 

 Key: HBASE-7267
 URL: https://issues.apache.org/jira/browse/HBASE-7267
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang

 In HBASE-6059, it introduced a new behavior that the compaction would create 
 the HFileWriter no mater whether there is any key/value as the output or not. 
 This new behavior actually is conflicts with HBASE-5199 (Delete out of TTL 
 store files before compaction selection) so that compacting the expired 
 hfiles would generate one more expired hfiles.
 Actually we only needs to create the dummy hfile IFF the maxSequenceID among 
 the compaction candidates is equal to the maxSequenceID among all the on-disk 
 hfiles. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7276) [89-fb] Optimize the read/write requests metrics in the RegionServer level

2012-12-04 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-7276:
-

 Summary: [89-fb] Optimize the read/write requests metrics in the 
RegionServer level
 Key: HBASE-7276
 URL: https://issues.apache.org/jira/browse/HBASE-7276
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang


In HBase, each RegionServer will host a set of Regions and both of them keep 
track of the read/write requests metrics. So total number of read/write 
requests among all the Regions shall be equal to the total number from the 
RegionServer. We shall optimize the code to remove the redundant metrics in the 
RegionServer level, and merge the Region level metrics into the RegionServer 
level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7266) [89-fb] Using pread for non-compaction read request

2012-12-03 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-7266:
-

 Summary: [89-fb] Using pread for non-compaction read request
 Key: HBASE-7266
 URL: https://issues.apache.org/jira/browse/HBASE-7266
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang


There are 2 kinds of read operations in HBase: pread and seek+read.
Pread, positional read, is stateless and create a new connection between the 
DFSClient and DataNode for each operation. While seek+read is to seek to a 
specific postion and prefetch blocks from data nodes. The benefit of seek+read 
is that it will cache the prefetch result but the downside is it is stateful 
and needs to synchronized.

So far, both compaction and scan are using pread, which caused some resource 
contention. So using the pread for the scan request can avoid the resource 
contention. In addition, the region server is able to do the prefetch for the 
scan request (HBASE-6874) so that it won't be necessary to let the DFSClient to 
prefetch the data any more.

I will run through the scan benchmark (with no block cache) with verify the 
performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7266) [89-fb] Using pread for non-compaction read request

2012-12-03 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-7266:
--

Description: 
There are 2 kinds of read operations in HBase: pread and seek+read.
Pread, positional read, is stateless and create a new connection between the 
DFSClient and DataNode for each operation. While seek+read is to seek to a 
specific postion and prefetch blocks from data nodes. The benefit of seek+read 
is that it will cache the prefetch result but the downside is it is stateful 
and needs to synchronized.

So far, both compaction and scan are using seek+read, which caused some 
resource contention. So using the pread for the scan request can avoid the 
resource contention. In addition, the region server is able to do the prefetch 
for the scan request (HBASE-6874) so that it won't be necessary to let the 
DFSClient to prefetch the data any more.

I will run through the scan benchmark (with no block cache) with verify the 
performance.

  was:
There are 2 kinds of read operations in HBase: pread and seek+read.
Pread, positional read, is stateless and create a new connection between the 
DFSClient and DataNode for each operation. While seek+read is to seek to a 
specific postion and prefetch blocks from data nodes. The benefit of seek+read 
is that it will cache the prefetch result but the downside is it is stateful 
and needs to synchronized.

So far, both compaction and scan are using pread, which caused some resource 
contention. So using the pread for the scan request can avoid the resource 
contention. In addition, the region server is able to do the prefetch for the 
scan request (HBASE-6874) so that it won't be necessary to let the DFSClient to 
prefetch the data any more.

I will run through the scan benchmark (with no block cache) with verify the 
performance.


 [89-fb] Using pread for non-compaction read request
 ---

 Key: HBASE-7266
 URL: https://issues.apache.org/jira/browse/HBASE-7266
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang

 There are 2 kinds of read operations in HBase: pread and seek+read.
 Pread, positional read, is stateless and create a new connection between the 
 DFSClient and DataNode for each operation. While seek+read is to seek to a 
 specific postion and prefetch blocks from data nodes. The benefit of 
 seek+read is that it will cache the prefetch result but the downside is it is 
 stateful and needs to synchronized.
 So far, both compaction and scan are using seek+read, which caused some 
 resource contention. So using the pread for the scan request can avoid the 
 resource contention. In addition, the region server is able to do the 
 prefetch for the scan request (HBASE-6874) so that it won't be necessary to 
 let the DFSClient to prefetch the data any more.
 I will run through the scan benchmark (with no block cache) with verify the 
 performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7267) [89-fb] Only create the dummy hfile for the compaction if necessary.

2012-12-03 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-7267:
-

 Summary: [89-fb] Only create the dummy hfile for the compaction if 
necessary.
 Key: HBASE-7267
 URL: https://issues.apache.org/jira/browse/HBASE-7267
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang


In HBASE-6059, it introduced a new behavior that the compaction would create 
the HFileWriter no mater whether there is any key/value as the output or not. 
This new behavior actually is conflicts with HBASE-5199 (Delete out of TTL 
store files before compaction selection) so that compacting the expired hfiles 
would generate one more expired hfiles.

Actually we only needs to create the dummy hfile iff the maxSequenceID among 
the compaction candidates is equal the the maxSequenceID among all the on-disk 
hfiles. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7267) [89-fb] Only create the dummy hfile for the compaction if necessary.

2012-12-03 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-7267:
--

Description: 
In HBASE-6059, it introduced a new behavior that the compaction would create 
the HFileWriter no mater whether there is any key/value as the output or not. 
This new behavior actually is conflicts with HBASE-5199 (Delete out of TTL 
store files before compaction selection) so that compacting the expired hfiles 
would generate one more expired hfiles.

Actually we only needs to create the dummy hfile IFF the maxSequenceID among 
the compaction candidates is equal to the the maxSequenceID among all the 
on-disk hfiles. 

  was:
In HBASE-6059, it introduced a new behavior that the compaction would create 
the HFileWriter no mater whether there is any key/value as the output or not. 
This new behavior actually is conflicts with HBASE-5199 (Delete out of TTL 
store files before compaction selection) so that compacting the expired hfiles 
would generate one more expired hfiles.

Actually we only needs to create the dummy hfile iff the maxSequenceID among 
the compaction candidates is equal the the maxSequenceID among all the on-disk 
hfiles. 


 [89-fb] Only create the dummy hfile for the compaction if necessary.
 

 Key: HBASE-7267
 URL: https://issues.apache.org/jira/browse/HBASE-7267
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang

 In HBASE-6059, it introduced a new behavior that the compaction would create 
 the HFileWriter no mater whether there is any key/value as the output or not. 
 This new behavior actually is conflicts with HBASE-5199 (Delete out of TTL 
 store files before compaction selection) so that compacting the expired 
 hfiles would generate one more expired hfiles.
 Actually we only needs to create the dummy hfile IFF the maxSequenceID among 
 the compaction candidates is equal to the the maxSequenceID among all the 
 on-disk hfiles. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7164) [89-fb] Using HFileOutputFormat as MapOutputFormat

2012-11-14 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-7164:
-

 Summary: [89-fb] Using HFileOutputFormat as MapOutputFormat
 Key: HBASE-7164
 URL: https://issues.apache.org/jira/browse/HBASE-7164
 Project: HBase
  Issue Type: New Feature
Reporter: Liyin Tang
Priority: Minor


Add one more option in TableMapReduceUtil to initialize a Map only job which 
takes TableInputFormat as MapInputFormat and HFileOutputFormat as 
MapOutputFormat. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7106) [89-fb] Fix the NPE in unit tests for JDK7

2012-11-08 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493401#comment-13493401
 ] 

Liyin Tang commented on HBASE-7106:
---

Gustavo Anatoly: I didn't fully understand your questions :) The pom change is 
orthogonal with the code change.

Jimmy, The semantics of NULL column qualifier is equal to that of the 
EMPYT_BYTE_ARRAY column qualifier. 
However, the fix in HBASE-6206 will skip the NULL qualifier.
-set.add(qualifier);
+if (qualifier != null) {
+  set.add(qualifier);
+}

=
I think the correct fix shall be:

if (qualifier != null) {
  set.add(qualifier);
} else {
  set.add(HConstants.EMPTY_BYTE_ARRAY);
}

 [89-fb] Fix the NPE in unit tests for JDK7
 --

 Key: HBASE-7106
 URL: https://issues.apache.org/jira/browse/HBASE-7106
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Priority: Trivial

 In JDK7, it will throw out NPE if put a NULL into a TreeSet. And in the unit 
 tests, user can add a NULL as qualifier into the family map for GET or SCAN. 
 So we shall do the followings: 
 1) Make sure the semantics of NULL column qualifier is equal to that of the 
 EMPYT_BYTE_ARRAY column qualifier.
 2) An easy fix is to use the EMPYT_BYTE_ARRAY qualifier to replace NULL 
 qualifier in the family map for the GET or SCAN objects, and everything else 
 shall be backward compatible.
 3) Add a jdk option in the pom.xml (Assuming user installed the fb packaged 
 jdk)
 eg: mvn test -Dtest=TestFromClientSide -Pjdk7

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7106) [89-fb] Fix the NPE in unit tests for JDK7

2012-11-08 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493478#comment-13493478
 ] 

Liyin Tang commented on HBASE-7106:
---

Gustavo Anatoly, sure ! 

 [89-fb] Fix the NPE in unit tests for JDK7
 --

 Key: HBASE-7106
 URL: https://issues.apache.org/jira/browse/HBASE-7106
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Priority: Trivial

 In JDK7, it will throw out NPE if put a NULL into a TreeSet. And in the unit 
 tests, user can add a NULL as qualifier into the family map for GET or SCAN. 
 So we shall do the followings: 
 1) Make sure the semantics of NULL column qualifier is equal to that of the 
 EMPYT_BYTE_ARRAY column qualifier.
 2) An easy fix is to use the EMPYT_BYTE_ARRAY qualifier to replace NULL 
 qualifier in the family map for the GET or SCAN objects, and everything else 
 shall be backward compatible.
 3) Add a jdk option in the pom.xml (Assuming user installed the fb packaged 
 jdk)
 eg: mvn test -Dtest=TestFromClientSide -Pjdk7

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6371) [89-fb] Tier based compaction

2012-11-06 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-6371:
--

Attachment: (was: HBase_Tier_Base_Compaction.pdf)

 [89-fb] Tier based compaction
 -

 Key: HBASE-6371
 URL: https://issues.apache.org/jira/browse/HBASE-6371
 Project: HBase
  Issue Type: Improvement
Reporter: Akashnil
Assignee: Liyin Tang
  Labels: noob
 Attachments: HBASE-6371-089fb-commit.patch, 
 HBase_Tier_Base_Compaction.pdf


 Currently, the compaction selection is not very flexible and is not sensitive 
 to the hotness of the data. Very old data is likely to be accessed less, and 
 very recent data is likely to be in the block cache. Both of these 
 considerations make it inefficient to compact these files as aggressively as 
 other files. In some use-cases, the access-pattern is particularly obvious 
 even though there is no way to control the compaction algorithm in those 
 cases.
 In the new compaction selection algorithm, we plan to divide the candidate 
 files into different levels according to oldness of the data that is present 
 in those files. For each level, parameters like compaction ratio, minimum 
 number of store-files in each compaction may be different. Number of levels, 
 time-ranges, and parameters for each level will be configurable online on a 
 per-column family basis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6371) [89-fb] Tier based compaction

2012-11-06 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-6371:
--

Attachment: HBase_Tier_Base_Compaction.pdf

 [89-fb] Tier based compaction
 -

 Key: HBASE-6371
 URL: https://issues.apache.org/jira/browse/HBASE-6371
 Project: HBase
  Issue Type: Improvement
Reporter: Akashnil
Assignee: Liyin Tang
  Labels: noob
 Attachments: HBASE-6371-089fb-commit.patch, 
 HBase_Tier_Base_Compaction.pdf


 Currently, the compaction selection is not very flexible and is not sensitive 
 to the hotness of the data. Very old data is likely to be accessed less, and 
 very recent data is likely to be in the block cache. Both of these 
 considerations make it inefficient to compact these files as aggressively as 
 other files. In some use-cases, the access-pattern is particularly obvious 
 even though there is no way to control the compaction algorithm in those 
 cases.
 In the new compaction selection algorithm, we plan to divide the candidate 
 files into different levels according to oldness of the data that is present 
 in those files. For each level, parameters like compaction ratio, minimum 
 number of store-files in each compaction may be different. Number of levels, 
 time-ranges, and parameters for each level will be configurable online on a 
 per-column family basis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7106) [89-fb] Fix the NPE in unit tests for JDK7

2012-11-06 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-7106:
-

 Summary: [89-fb] Fix the NPE in unit tests for JDK7
 Key: HBASE-7106
 URL: https://issues.apache.org/jira/browse/HBASE-7106
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Priority: Trivial


In JDK7, it will throw out NPE if put a NULL into a TreeSet. So the easy fix is 
to skip putting the NULL qualifier into the family map for the GET and SCAN 
objects, and everything else shall be backward compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7106) [89-fb] Fix the NPE in unit tests for JDK7

2012-11-06 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-7106:
--

Description: 
In JDK7, it will throw out NPE if put a NULL into a TreeSet. And in the unit 
tests, user can add a NULL as qualifier into the family map for GET or SCAN. 
So we shall do the followings: 

1) Make sure the semantics of NULL column qualifier is equal to that of the 
EMPYT_BYTE_ARRAY column qualifier.

2) An easy fix is to use the EMPYT_BYTE_ARRAY qualifier to replace NULL 
qualifier in the family map for the GET or SCAN objects, and everything else 
shall be backward compatible.

3) Add a jdk option in the pom.xml (Assuming user install the fb packaged jdk)
eg: mvn test -Dtest=TestFromClientSide -Pjdk7

  was:In JDK7, it will throw out NPE if put a NULL into a TreeSet. So the easy 
fix is to skip putting the NULL qualifier into the family map for the GET and 
SCAN objects, and everything else shall be backward compatible.


 [89-fb] Fix the NPE in unit tests for JDK7
 --

 Key: HBASE-7106
 URL: https://issues.apache.org/jira/browse/HBASE-7106
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Priority: Trivial

 In JDK7, it will throw out NPE if put a NULL into a TreeSet. And in the unit 
 tests, user can add a NULL as qualifier into the family map for GET or SCAN. 
 So we shall do the followings: 
 1) Make sure the semantics of NULL column qualifier is equal to that of the 
 EMPYT_BYTE_ARRAY column qualifier.
 2) An easy fix is to use the EMPYT_BYTE_ARRAY qualifier to replace NULL 
 qualifier in the family map for the GET or SCAN objects, and everything else 
 shall be backward compatible.
 3) Add a jdk option in the pom.xml (Assuming user install the fb packaged jdk)
 eg: mvn test -Dtest=TestFromClientSide -Pjdk7

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7106) [89-fb] Fix the NPE in unit tests for JDK7

2012-11-06 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-7106:
--

Description: 
In JDK7, it will throw out NPE if put a NULL into a TreeSet. And in the unit 
tests, user can add a NULL as qualifier into the family map for GET or SCAN. 
So we shall do the followings: 

1) Make sure the semantics of NULL column qualifier is equal to that of the 
EMPYT_BYTE_ARRAY column qualifier.

2) An easy fix is to use the EMPYT_BYTE_ARRAY qualifier to replace NULL 
qualifier in the family map for the GET or SCAN objects, and everything else 
shall be backward compatible.

3) Add a jdk option in the pom.xml (Assuming user installed the fb packaged jdk)
eg: mvn test -Dtest=TestFromClientSide -Pjdk7

  was:
In JDK7, it will throw out NPE if put a NULL into a TreeSet. And in the unit 
tests, user can add a NULL as qualifier into the family map for GET or SCAN. 
So we shall do the followings: 

1) Make sure the semantics of NULL column qualifier is equal to that of the 
EMPYT_BYTE_ARRAY column qualifier.

2) An easy fix is to use the EMPYT_BYTE_ARRAY qualifier to replace NULL 
qualifier in the family map for the GET or SCAN objects, and everything else 
shall be backward compatible.

3) Add a jdk option in the pom.xml (Assuming user install the fb packaged jdk)
eg: mvn test -Dtest=TestFromClientSide -Pjdk7


 [89-fb] Fix the NPE in unit tests for JDK7
 --

 Key: HBASE-7106
 URL: https://issues.apache.org/jira/browse/HBASE-7106
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Priority: Trivial

 In JDK7, it will throw out NPE if put a NULL into a TreeSet. And in the unit 
 tests, user can add a NULL as qualifier into the family map for GET or SCAN. 
 So we shall do the followings: 
 1) Make sure the semantics of NULL column qualifier is equal to that of the 
 EMPYT_BYTE_ARRAY column qualifier.
 2) An easy fix is to use the EMPYT_BYTE_ARRAY qualifier to replace NULL 
 qualifier in the family map for the GET or SCAN objects, and everything else 
 shall be backward compatible.
 3) Add a jdk option in the pom.xml (Assuming user installed the fb packaged 
 jdk)
 eg: mvn test -Dtest=TestFromClientSide -Pjdk7

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6371) [89-fb] Tier based compaction

2012-11-05 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-6371:
--

Attachment: HBase_Tier_Base_Compaction.pdf

The design doc for HBase Tier-based Compaction from Akashnil.

 [89-fb] Tier based compaction
 -

 Key: HBASE-6371
 URL: https://issues.apache.org/jira/browse/HBASE-6371
 Project: HBase
  Issue Type: Improvement
Reporter: Akashnil
Assignee: Liyin Tang
  Labels: noob
 Attachments: HBASE-6371-089fb-commit.patch, 
 HBase_Tier_Base_Compaction.pdf


 Currently, the compaction selection is not very flexible and is not sensitive 
 to the hotness of the data. Very old data is likely to be accessed less, and 
 very recent data is likely to be in the block cache. Both of these 
 considerations make it inefficient to compact these files as aggressively as 
 other files. In some use-cases, the access-pattern is particularly obvious 
 even though there is no way to control the compaction algorithm in those 
 cases.
 In the new compaction selection algorithm, we plan to divide the candidate 
 files into different levels according to oldness of the data that is present 
 in those files. For each level, parameters like compaction ratio, minimum 
 number of store-files in each compaction may be different. Number of levels, 
 time-ranges, and parameters for each level will be configurable online on a 
 per-column family basis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-6371) [89-fb] Level based compaction

2012-10-14 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang reassigned HBASE-6371:
-

Assignee: Liyin Tang  (was: Akashnil)

 [89-fb] Level based compaction
 --

 Key: HBASE-6371
 URL: https://issues.apache.org/jira/browse/HBASE-6371
 Project: HBase
  Issue Type: Improvement
Reporter: Akashnil
Assignee: Liyin Tang
  Labels: noob

 Currently, the compaction selection is not very flexible and is not sensitive 
 to the hotness of the data. Very old data is likely to be accessed less, and 
 very recent data is likely to be in the block cache. Both of these 
 considerations make it inefficient to compact these files as aggressively as 
 other files. In some use-cases, the access-pattern is particularly obvious 
 even though there is no way to control the compaction algorithm in those 
 cases.
 In the new compaction selection algorithm, we plan to divide the candidate 
 files into different levels according to oldness of the data that is present 
 in those files. For each level, parameters like compaction ratio, minimum 
 number of store-files in each compaction may be different. Number of levels, 
 time-ranges, and parameters for each level will be configurable online on a 
 per-column family basis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6371) [89-fb] Tier based compaction

2012-10-14 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-6371:
--

Summary: [89-fb] Tier based compaction  (was: [89-fb] Level based 
compaction)

 [89-fb] Tier based compaction
 -

 Key: HBASE-6371
 URL: https://issues.apache.org/jira/browse/HBASE-6371
 Project: HBase
  Issue Type: Improvement
Reporter: Akashnil
Assignee: Liyin Tang
  Labels: noob

 Currently, the compaction selection is not very flexible and is not sensitive 
 to the hotness of the data. Very old data is likely to be accessed less, and 
 very recent data is likely to be in the block cache. Both of these 
 considerations make it inefficient to compact these files as aggressively as 
 other files. In some use-cases, the access-pattern is particularly obvious 
 even though there is no way to control the compaction algorithm in those 
 cases.
 In the new compaction selection algorithm, we plan to divide the candidate 
 files into different levels according to oldness of the data that is present 
 in those files. For each level, parameters like compaction ratio, minimum 
 number of store-files in each compaction may be different. Number of levels, 
 time-ranges, and parameters for each level will be configurable online on a 
 per-column family basis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6371) [89-fb] Tier based compaction

2012-10-14 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475952#comment-13475952
 ] 

Liyin Tang commented on HBASE-6371:
---

As Nicolas suggested, rename the jira as tier based compaction.

 [89-fb] Tier based compaction
 -

 Key: HBASE-6371
 URL: https://issues.apache.org/jira/browse/HBASE-6371
 Project: HBase
  Issue Type: Improvement
Reporter: Akashnil
Assignee: Liyin Tang
  Labels: noob

 Currently, the compaction selection is not very flexible and is not sensitive 
 to the hotness of the data. Very old data is likely to be accessed less, and 
 very recent data is likely to be in the block cache. Both of these 
 considerations make it inefficient to compact these files as aggressively as 
 other files. In some use-cases, the access-pattern is particularly obvious 
 even though there is no way to control the compaction algorithm in those 
 cases.
 In the new compaction selection algorithm, we plan to divide the candidate 
 files into different levels according to oldness of the data that is present 
 in those files. For each level, parameters like compaction ratio, minimum 
 number of store-files in each compaction may be different. Number of levels, 
 time-ranges, and parameters for each level will be configurable online on a 
 per-column family basis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-6968) Several HBase write perf improvement

2012-10-09 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-6968:
-

 Summary: Several HBase write perf improvement
 Key: HBASE-6968
 URL: https://issues.apache.org/jira/browse/HBASE-6968
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang


There are two improvements in this jira:
1) Change 2 hotspot synchronized functions into double locking pattern. So it 
shall remove the synchronization overhead in the normal case.

2) Avoid creating HBaseConfiguraiton object for each HLog. Every time when 
creating a HBaseConfiguraiton object, it would parse the xml configuration 
files from disk, which is not cheap operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6968) Several HBase write perf improvement

2012-10-09 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-6968:
--

Description: 
Here are 2 hbase write performance improvements recently found out. 

1) Avoid creating HBaseConfiguraiton object for each HLog. Every time when 
creating a HBaseConfiguraiton object, it would parse the xml configuration 
files from disk, which is not cheap operation.
In HLog.java:
orig:
{code:title=HLog.java}
  newWriter = createWriter(fs, newPath, HBaseConfiguration.create(conf));
{code}
new:
{code}
  newWriter = createWriter(fs, newPath, conf);
{code}


2) Change 2 hotspot synchronized functions into double locking pattern. So it 
shall remove the synchronization overhead in the normal case.
orig:
{code:title=HBaseRpcMetrics.java}
  public synchronized void inc(String name, int amt) {  
MetricsTimeVaryingRate m = get(name);   
if (m == null) {
  m = create(name); 
}   
m.inc(amt); 
  }
{code}

new:
{code}
  public void inc(String name, int amt) {   
MetricsTimeVaryingRate m = get(name);   
if (m == null) {
  synchronized (this) { 
if ((m = get(name)) == null) {  
  m = create(name); 
}   
  } 
}   
m.inc(amt); 
  }
{code}
=
orig:
{code:title=MemStoreFlusher.java}
  public synchronized void reclaimMemStoreMemory() {
if (this.server.getGlobalMemstoreSize().get() = globalMemStoreLimit) { 
  flushSomeRegions();   
}
  } 
{code}
new:
{code}
  public void reclaimMemStoreMemory() { 
if (this.server.getGlobalMemstoreSize().get() = globalMemStoreLimit) { 
  flushSomeRegions();   
}
  } 
  private synchronized void flushSomeRegions() {
if (this.server.getGlobalMemstoreSize().get()  globalMemStoreLimit) {  
  return; // double check the global memstore size inside of the 
synchronized block.
}   
 ...   
 }
{code}



  was:
There are two improvements in this jira:
1) Change 2 hotspot synchronized functions into double locking pattern. So it 
shall remove the synchronization overhead in the normal case.

2) Avoid creating HBaseConfiguraiton object for each HLog. Every time when 
creating a HBaseConfiguraiton object, it would parse the xml configuration 
files from disk, which is not cheap operation.


 Several HBase write perf improvement
 

 Key: HBASE-6968
 URL: https://issues.apache.org/jira/browse/HBASE-6968
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang

 Here are 2 hbase write performance improvements recently found out. 
 1) Avoid creating HBaseConfiguraiton object for each HLog. Every time when 
 creating a HBaseConfiguraiton object, it would parse the xml configuration 
 files from disk, which is not cheap operation.
 In HLog.java:
 orig:
 {code:title=HLog.java}
   newWriter = createWriter(fs, newPath, HBaseConfiguration.create(conf));
 {code}
 new:
 {code}
   newWriter = createWriter(fs, newPath, conf);
 {code}
 2) Change 2 hotspot synchronized functions into double locking pattern. So it 
 shall remove the synchronization overhead in the normal case.
 orig:
 {code:title=HBaseRpcMetrics.java}
   public synchronized void inc(String name, int amt) {
 MetricsTimeVaryingRate m = get(name); 
 if (m == null) {  
   m = create(name);   
 } 
 m.inc(amt);   
   }
 {code}
 new:
 {code}
   public void inc(String name, int amt) { 
 MetricsTimeVaryingRate m = get(name); 
 if (m == null) {  
   synchronized (this) {   
 if ((m = get(name)) == null) {
   m = create(name);   
 } 
   }   
 } 
 m.inc(amt);   
   }
 {code}
 =
 orig:
 {code:title=MemStoreFlusher.java}
   public synchronized void reclaimMemStoreMemory() {  
 if (this.server.getGlobalMemstoreSize().get() = globalMemStoreLimit) {   
   flushSomeRegions(); 
 }
   }   
 {code}
 new:
 {code}
   public void reclaimMemStoreMemory() {   
 if (this.server.getGlobalMemstoreSize().get() = globalMemStoreLimit) {   
   flushSomeRegions(); 
 }
   }   
   private synchronized void flushSomeRegions() {  
 if (this.server.getGlobalMemstoreSize().get()  globalMemStoreLimit) {
   return; // double check the global memstore size inside of the 
 synchronized block.  
 } 
  ...   
  }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6968) Several HBase write perf improvement

2012-10-09 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-6968:
--

Description: 
Here are 2 hbase write performance improvements recently:

1) Avoid creating HBaseConfiguraiton object for each HLog. Every time when 
creating a HBaseConfiguraiton object, it would parse the xml configuration 
files from disk, which is not cheap operation.
In HLog.java:
orig:
{code:title=HLog.java}
  newWriter = createWriter(fs, newPath, HBaseConfiguration.create(conf));
{code}
new:
{code}
  newWriter = createWriter(fs, newPath, conf);
{code}


2) Change 2 hotspot synchronized functions into double locking pattern. So it 
shall remove the synchronization overhead in the normal case.
orig:
{code:title=HBaseRpcMetrics.java}
  public synchronized void inc(String name, int amt) {  
MetricsTimeVaryingRate m = get(name);   
if (m == null) {
  m = create(name); 
}   
m.inc(amt); 
  }
{code}

new:
{code}
  public void inc(String name, int amt) {   
MetricsTimeVaryingRate m = get(name);   
if (m == null) {
  synchronized (this) { 
if ((m = get(name)) == null) {  
  m = create(name); 
}   
  } 
}   
m.inc(amt); 
  }
{code}
=
orig:
{code:title=MemStoreFlusher.java}
  public synchronized void reclaimMemStoreMemory() {
if (this.server.getGlobalMemstoreSize().get() = globalMemStoreLimit) { 
  flushSomeRegions();   
}
  } 
{code}
new:
{code}
  public void reclaimMemStoreMemory() { 
if (this.server.getGlobalMemstoreSize().get() = globalMemStoreLimit) { 
  flushSomeRegions();   
}
  } 
  private synchronized void flushSomeRegions() {
if (this.server.getGlobalMemstoreSize().get()  globalMemStoreLimit) {  
  return; // double check the global memstore size inside of the 
synchronized block.
}   
 ...   
 }
{code}



  was:
Here are 2 hbase write performance improvements recently found out. 

1) Avoid creating HBaseConfiguraiton object for each HLog. Every time when 
creating a HBaseConfiguraiton object, it would parse the xml configuration 
files from disk, which is not cheap operation.
In HLog.java:
orig:
{code:title=HLog.java}
  newWriter = createWriter(fs, newPath, HBaseConfiguration.create(conf));
{code}
new:
{code}
  newWriter = createWriter(fs, newPath, conf);
{code}


2) Change 2 hotspot synchronized functions into double locking pattern. So it 
shall remove the synchronization overhead in the normal case.
orig:
{code:title=HBaseRpcMetrics.java}
  public synchronized void inc(String name, int amt) {  
MetricsTimeVaryingRate m = get(name);   
if (m == null) {
  m = create(name); 
}   
m.inc(amt); 
  }
{code}

new:
{code}
  public void inc(String name, int amt) {   
MetricsTimeVaryingRate m = get(name);   
if (m == null) {
  synchronized (this) { 
if ((m = get(name)) == null) {  
  m = create(name); 
}   
  } 
}   
m.inc(amt); 
  }
{code}
=
orig:
{code:title=MemStoreFlusher.java}
  public synchronized void reclaimMemStoreMemory() {
if (this.server.getGlobalMemstoreSize().get() = globalMemStoreLimit) { 
  flushSomeRegions();   
}
  } 
{code}
new:
{code}
  public void reclaimMemStoreMemory() { 
if (this.server.getGlobalMemstoreSize().get() = globalMemStoreLimit) { 
  flushSomeRegions();   
}
  } 
  private synchronized void flushSomeRegions() {
if (this.server.getGlobalMemstoreSize().get()  globalMemStoreLimit) {  
  return; // double check the global memstore size inside of the 
synchronized block.
}   
 ...   
 }
{code}




 Several HBase write perf improvement
 

 Key: HBASE-6968
 URL: https://issues.apache.org/jira/browse/HBASE-6968
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang

 Here are 2 hbase write performance improvements recently:
 1) Avoid creating HBaseConfiguraiton object for each HLog. Every time when 
 creating a HBaseConfiguraiton object, it would parse the xml configuration 
 files from disk, which is not cheap operation.
 In HLog.java:
 orig:
 {code:title=HLog.java}
   newWriter = createWriter(fs, newPath, HBaseConfiguration.create(conf));
 {code}
 new:
 {code}
   newWriter = createWriter(fs, newPath, conf);
 {code}
 2) Change 2 hotspot synchronized functions into double locking pattern. So it 
 shall remove the synchronization overhead in the normal case.
 orig:
 {code:title=HBaseRpcMetrics.java}
   public synchronized void inc(String name, int amt) {
 MetricsTimeVaryingRate m = get(name); 
 if (m == null) {  
   m = create(name);   
 } 
 

[jira] [Created] (HBASE-6930) [89-fb] Avoid acquiring the same row lock repeatedly

2012-10-02 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-6930:
-

 Summary: [89-fb] Avoid acquiring the same row lock repeatedly
 Key: HBASE-6930
 URL: https://issues.apache.org/jira/browse/HBASE-6930
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang


When processing the multiPut, multiMutations or multiDelete operations, each 
IPC handler thread tries to acquire a lock for each row key in these batches. 
If there are duplicated row keys in these batches, previously the IPC handler 
thread will repeatedly acquire the same row key again and again.

So the optimization is to sort each batch operation based on the row key in the 
client side, and skip acquiring the same row lock repeatedly in the server side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-6911) Set the logging for the location cache hit in the hbase client as trace level

2012-10-01 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-6911:
-

 Summary: Set the logging for the location cache hit in the hbase 
client as trace level
 Key: HBASE-6911
 URL: https://issues.apache.org/jira/browse/HBASE-6911
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Priority: Trivial


It is too much logging for each row-location cache hit in the hbase client. So 
set it as the trace level. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6911) [89-fb] Set the logging for the location cache hit in the hbase client as trace level

2012-10-01 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-6911:
--

Summary: [89-fb] Set the logging for the location cache hit in the hbase 
client as trace level  (was: Set the logging for the location cache hit in the 
hbase client as trace level)

 [89-fb] Set the logging for the location cache hit in the hbase client as 
 trace level
 -

 Key: HBASE-6911
 URL: https://issues.apache.org/jira/browse/HBASE-6911
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Priority: Trivial

 It is too much logging for each row-location cache hit in the hbase client. 
 So set it as the trace level. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper

2012-09-21 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-6858:
-

 Summary: Fix the incorrect BADVERSION checking in the recoverable 
zookeeper
 Key: HBASE-6858
 URL: https://issues.apache.org/jira/browse/HBASE-6858
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang


Thanks for Stack and Kaka's reporting that there is a bug in the recoverable 
zookeeper when handling BADVERSION exception for setData(). It shall compare 
the ID payload of the data in zk with its own identifier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper

2012-09-21 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-6858:
--

Attachment: HBASE-6858.patch

 Fix the incorrect BADVERSION checking in the recoverable zookeeper
 --

 Key: HBASE-6858
 URL: https://issues.apache.org/jira/browse/HBASE-6858
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang
 Attachments: HBASE-6858.patch


 Thanks for Stack and Kaka's reporting that there is a bug in the recoverable 
 zookeeper when handling BADVERSION exception for setData(). It shall compare 
 the ID payload of the data in zk with its own identifier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper

2012-09-21 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang reassigned HBASE-6858:
-

Assignee: Liyin Tang

 Fix the incorrect BADVERSION checking in the recoverable zookeeper
 --

 Key: HBASE-6858
 URL: https://issues.apache.org/jira/browse/HBASE-6858
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: HBASE-6858.patch


 Thanks for Stack and Kaka's reporting that there is a bug in the recoverable 
 zookeeper when handling BADVERSION exception for setData(). It shall compare 
 the ID payload of the data in zk with its own identifier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper

2012-09-21 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-6858:
--

Attachment: (was: HBASE-6858.patch)

 Fix the incorrect BADVERSION checking in the recoverable zookeeper
 --

 Key: HBASE-6858
 URL: https://issues.apache.org/jira/browse/HBASE-6858
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: HBASE-6858.patch


 Thanks for Stack and Kaka's reporting that there is a bug in the recoverable 
 zookeeper when handling BADVERSION exception for setData(). It shall compare 
 the ID payload of the data in zk with its own identifier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper

2012-09-21 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-6858:
--

Attachment: HBASE-6858.patch

 Fix the incorrect BADVERSION checking in the recoverable zookeeper
 --

 Key: HBASE-6858
 URL: https://issues.apache.org/jira/browse/HBASE-6858
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: HBASE-6858.patch


 Thanks for Stack and Kaka's reporting that there is a bug in the recoverable 
 zookeeper when handling BADVERSION exception for setData(). It shall compare 
 the ID payload of the data in zk with its own identifier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper

2012-09-21 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460855#comment-13460855
 ] 

Liyin Tang commented on HBASE-6858:
---

Addressed Jimmy's comments!  Thanks Jimmy !

 Fix the incorrect BADVERSION checking in the recoverable zookeeper
 --

 Key: HBASE-6858
 URL: https://issues.apache.org/jira/browse/HBASE-6858
 Project: HBase
  Issue Type: Bug
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.2, 0.96.0

 Attachments: HBASE-6858.patch


 Thanks for Stack and Kaka's reporting that there is a bug in the recoverable 
 zookeeper when handling BADVERSION exception for setData(). It shall compare 
 the ID payload of the data in zk with its own identifier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper

2012-09-21 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460872#comment-13460872
 ] 

Liyin Tang commented on HBASE-6858:
---

The code is difference between 89 and trunk. Some variable has been renamed. 
Let me re-submit the patch !

 Fix the incorrect BADVERSION checking in the recoverable zookeeper
 --

 Key: HBASE-6858
 URL: https://issues.apache.org/jira/browse/HBASE-6858
 Project: HBase
  Issue Type: Bug
  Components: Zookeeper
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.94.2, 0.96.0

 Attachments: HBASE-6858.patch


 Thanks for Stack and Kaka's reporting that there is a bug in the recoverable 
 zookeeper when handling BADVERSION exception for setData(). It shall compare 
 the ID payload of the data in zk with its own identifier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper

2012-09-21 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-6858:
--

Attachment: HBASE-6858_v2.patch

 Fix the incorrect BADVERSION checking in the recoverable zookeeper
 --

 Key: HBASE-6858
 URL: https://issues.apache.org/jira/browse/HBASE-6858
 Project: HBase
  Issue Type: Bug
  Components: Zookeeper
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.96.0

 Attachments: HBASE-6858.patch, HBASE-6858_v2.patch


 Thanks for Stack and Kaka's reporting that there is a bug in the recoverable 
 zookeeper when handling BADVERSION exception for setData(). It shall compare 
 the ID payload of the data in zk with its own identifier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper

2012-09-21 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460934#comment-13460934
 ] 

Liyin Tang commented on HBASE-6858:
---

I agree that this is not a very general solution and may introduce race 
condition if multiple threads in one zk client try to update the same znode 
with different version number, then current code will hide the BADVERSION 
exception. We didn't find this use case in HBase at that time, roughly 1.5 
years ago and it is cheaper to compare the identifier than comparing the data 
payload. 
I also believe leaving this kind of assumption in the system may introduce or 
has introduced some uncertainty or bugs and it is definitely worth improving.

 Fix the incorrect BADVERSION checking in the recoverable zookeeper
 --

 Key: HBASE-6858
 URL: https://issues.apache.org/jira/browse/HBASE-6858
 Project: HBase
  Issue Type: Bug
  Components: Zookeeper
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.96.0

 Attachments: HBASE-6858.patch, HBASE-6858_v2.patch


 Thanks for Stack and Kaka's reporting that there is a bug in the recoverable 
 zookeeper when handling BADVERSION exception for setData(). It shall compare 
 the ID payload of the data in zk with its own identifier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper

2012-09-21 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460938#comment-13460938
 ] 

Liyin Tang commented on HBASE-6858:
---

@stack, the latest Hudson doesn't look like building with the latest patch 
(HBASE-6858_v2.patch). 

 Fix the incorrect BADVERSION checking in the recoverable zookeeper
 --

 Key: HBASE-6858
 URL: https://issues.apache.org/jira/browse/HBASE-6858
 Project: HBase
  Issue Type: Bug
  Components: Zookeeper
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.96.0

 Attachments: HBASE-6858.patch, HBASE-6858_v2.patch


 Thanks for Stack and Kaka's reporting that there is a bug in the recoverable 
 zookeeper when handling BADVERSION exception for setData(). It shall compare 
 the ID payload of the data in zk with its own identifier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper

2012-09-21 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460947#comment-13460947
 ] 

Liyin Tang commented on HBASE-6858:
---

The recoverable zk tried to recover from the connection less exception 
gracefully at beginning and I still believe this problem shall be solved by the 
zookeeper client library instead of the application such as HBase. The third 
option shall be check whether the latest zookeeper has supported to be 
recovered by the connection loss exception gracefully. In that case, we just 
need to totally remove the recoverable zk !



 Fix the incorrect BADVERSION checking in the recoverable zookeeper
 --

 Key: HBASE-6858
 URL: https://issues.apache.org/jira/browse/HBASE-6858
 Project: HBase
  Issue Type: Bug
  Components: Zookeeper
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.96.0

 Attachments: HBASE-6858.patch, HBASE-6858_v2.patch


 Thanks for Stack and Kaka's reporting that there is a bug in the recoverable 
 zookeeper when handling BADVERSION exception for setData(). It shall compare 
 the ID payload of the data in zk with its own identifier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper

2012-09-21 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-6858:
--

Attachment: HBASE-6858_v3.patch

Compare the entire data (identifier + data payload) together as discussed. In 
addition, we may need to append the thread id into the identifier. 

 Fix the incorrect BADVERSION checking in the recoverable zookeeper
 --

 Key: HBASE-6858
 URL: https://issues.apache.org/jira/browse/HBASE-6858
 Project: HBase
  Issue Type: Bug
  Components: Zookeeper
Reporter: Liyin Tang
Assignee: Liyin Tang
 Fix For: 0.96.0

 Attachments: HBASE-6858.patch, HBASE-6858_v2.patch, 
 HBASE-6858_v3.patch


 Thanks for Stack and Kaka's reporting that there is a bug in the recoverable 
 zookeeper when handling BADVERSION exception for setData(). It shall compare 
 the ID payload of the data in zk with its own identifier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-6673) Clear up the invalid ResultScanner in the ThriftServerRunner

2012-08-27 Thread Liyin Tang (JIRA)
Liyin Tang created HBASE-6673:
-

 Summary: Clear up the invalid ResultScanner in the 
ThriftServerRunner
 Key: HBASE-6673
 URL: https://issues.apache.org/jira/browse/HBASE-6673
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang


Clear up the invalid ResultScanner in the ThriftServerRunner

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-6556) Avoid ssh to localhost in startup scripts

2012-08-20 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang resolved HBASE-6556.
---

Resolution: Duplicate

 Avoid ssh to localhost in startup scripts
 -

 Key: HBASE-6556
 URL: https://issues.apache.org/jira/browse/HBASE-6556
 Project: HBase
  Issue Type: Improvement
  Components: scripts
 Environment: Mac OSX Mountain Lion, HBase 89-fb
Reporter: Ramkumar Vadali
Priority: Trivial

 The use of ssh in scripts like zookeepers.sh and regionservers.sh for a 
 single node setup is not necessary. We can execute the command directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6555) Avoid ssh to localhost in startup scripts

2012-08-20 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13438216#comment-13438216
 ] 

Liyin Tang commented on HBASE-6555:
---

Hi Ramkumar, I have resolved the 6556,6557 and 6558 as duplicated jiras. 

 Avoid ssh to localhost in startup scripts
 -

 Key: HBASE-6555
 URL: https://issues.apache.org/jira/browse/HBASE-6555
 Project: HBase
  Issue Type: Improvement
  Components: scripts
 Environment: Mac OSX Mountain Lion, HBase 89-fb
Reporter: Ramkumar Vadali
Priority: Trivial

 The use of ssh in scripts like zookeepers.sh and regionservers.sh for a 
 single node setup is not necessary. We can execute the command directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-6558) Avoid ssh to localhost in single node setup.

2012-08-20 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang resolved HBASE-6558.
---

Resolution: Duplicate

 Avoid ssh to localhost in single node setup. 
 -

 Key: HBASE-6558
 URL: https://issues.apache.org/jira/browse/HBASE-6558
 Project: HBase
  Issue Type: Improvement
  Components: scripts
 Environment: mac osx mountain lion, hbase 89-fb
Reporter: Ramkumar Vadali
Priority: Trivial
   Original Estimate: 24h
  Remaining Estimate: 24h

 The use of ssh in scripts like zookeepers.sh and regionservers.sh for a 
 single node setup is not necessary. We can execute the command directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-6557) Avoid ssh to localhost in single node setup.

2012-08-20 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang resolved HBASE-6557.
---

Resolution: Duplicate

 Avoid ssh to localhost in single node setup. 
 -

 Key: HBASE-6557
 URL: https://issues.apache.org/jira/browse/HBASE-6557
 Project: HBase
  Issue Type: Improvement
  Components: scripts
 Environment: mac osx mountain lion, hbase 89-fb
Reporter: Ramkumar Vadali
Priority: Trivial

 The use of ssh in scripts like zookeepers.sh and regionservers.sh for a 
 single node setup is not necessary. We can execute the command directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6361) Change the compaction queue to a round robin scheduler

2012-07-10 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang reassigned HBASE-6361:
-

Assignee: Akashnil

 Change the compaction queue to a round robin scheduler
 --

 Key: HBASE-6361
 URL: https://issues.apache.org/jira/browse/HBASE-6361
 Project: HBase
  Issue Type: Improvement
Reporter: Akashnil
Assignee: Akashnil

 Currently the compaction requests are submitted to the minor/major compaction 
 queue of a region-server from every column-family/region belonging to it. The 
 requests are processed from the queue in FIFO order (First in First out). We 
 want to make a lazy scheduler in place of the current queue-based one. The 
 idea of lazy scheduling is that, it is always better to make a decision 
 (compaction selection) later if the decision is relevant later only. 
 Presently, when the queue gets bottle-necked, there is a delay between 
 compaction selection of a request and its execution. Rather than that, we can 
 postpone the compaction selection until the queue is empty when we will have 
 more information and choices (new flush files will have arrived by then) to 
 make a better decision.
 Removing the queue, we propose to implement a round-robin scheduler. All the 
 column families in their regions will be visited in sequence periodically. In 
 each visit, if the column family generates a valid compaction request, the 
 request is executed before moving to the next one. We do not plan to change 
 the current compaction algorithm for now. We expect that it will 
 automatically make a better decision when doing just-in-time selection due to 
 the new change. How do we know that? Let us consider an example.
 Suppose there is a short term bottleneck in the queue so that it is blocked 
 for a period of time. (Let the min-files for compaction = 4). For an active 
 column-family, when new flushes are written, new compaction requests, each of 
 size 4, will be added to the queue continuously until the queue starts 
 processing them.
 Now consider a round-robin scheduler. The effect of a bottle-neck due to the 
 IO rate of compaction results in a longer latency to visit the same column 
 family again. When the same active column family is visited following a long 
 delay, suppose 16 new flush files have been written there. The compaction 
 selection algorithm will select one compaction request of size 16, as opposed 
 to 4 compaction requests of size 4 that would have been generated in the 
 previous case.
 A compaction request with 16 flush files is more IOPs-efficient than the same 
 set of files being compacted 4 at a time. This is because both consume the 
 same total amount of reads and writes while producing a file of size 16 
 compared to 4 files of size 4. So, in the second case, we obtained a free 
 compaction 4*4-16 without paying for it. In case of the queue, those smaller 
 4 sized files would have consumed more IOPs to become bigger later.
 On my simulator, I did some experiments on how a bottle-neck of the queue 
 affects the compaction selections in the current system. It appears that, a 
 filled up queue actually makes all future compaction selections less and less 
 efficient in terms of IOPs, resulting in a runway positive feedback loop 
 which can potentially explode the compaction queue. (This was also observed 
 in production recently). The main effect of this change should be to deal 
 with bursty loads. When a bottleneck occurs, the compaction selection will 
 become more IOPs-efficient rather than less efficient, resulting in negative 
 feedback and restoration to stability more easily. As for monitoring, the 
 compaction queue size will not be present as a metric. However, the number of 
 files in each compaction will indicate if a bottleneck has occurred.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >