[jira] [Created] (HBASE-15482) Provide an option to skip calculating block locations for SnapshotInputFormat
Liyin Tang created HBASE-15482: -- Summary: Provide an option to skip calculating block locations for SnapshotInputFormat Key: HBASE-15482 URL: https://issues.apache.org/jira/browse/HBASE-15482 Project: HBase Issue Type: Improvement Components: mapreduce Reporter: Liyin Tang Priority: Minor When a MR job is reading from SnapshotInputFormat, it needs to calculate the splits based on the block locations in order to get best locality. However, this process may take a long time for large snapshots. In some setup, the computing layer, Spark, Hive or Presto could run out side of HBase cluster. In these scenarios, the block locality doesn't matter. Therefore, it will be great to have an option to skip calculating the block locations for every job. That will super useful for the Hive/Presto/Spark connectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15482) Provide an option to skip calculating block locations for SnapshotInputFormat
[ https://issues.apache.org/jira/browse/HBASE-15482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15202071#comment-15202071 ] Liyin Tang commented on HBASE-15482: Yeah, that's right. Ideally, if SnapshotInputFormat can read directly from snapshot instead of restore, that will be awesome ! Restoring a millions of storefiles will also take a long time. But that will be out of the scope of this jira. > Provide an option to skip calculating block locations for SnapshotInputFormat > - > > Key: HBASE-15482 > URL: https://issues.apache.org/jira/browse/HBASE-15482 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Reporter: Liyin Tang >Priority: Minor > > When a MR job is reading from SnapshotInputFormat, it needs to calculate the > splits based on the block locations in order to get best locality. However, > this process may take a long time for large snapshots. > In some setup, the computing layer, Spark, Hive or Presto could run out side > of HBase cluster. In these scenarios, the block locality doesn't matter. > Therefore, it will be great to have an option to skip calculating the block > locations for every job. That will super useful for the Hive/Presto/Spark > connectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15482) Provide an option to skip calculating block locations for SnapshotInputFormat
[ https://issues.apache.org/jira/browse/HBASE-15482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15202539#comment-15202539 ] Liyin Tang commented on HBASE-15482: Dave, thanks for the response. Even we use HDFS snapshots, it will be great to have an option to skip calculating block locations. To decouple computing with storage , it is possible to set up computing layer for query engine like Spark/Hive/Presto in a different cluster. In these cases, the locality doesn't matter for both HBase and HDFS snapshots. > Provide an option to skip calculating block locations for SnapshotInputFormat > - > > Key: HBASE-15482 > URL: https://issues.apache.org/jira/browse/HBASE-15482 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Reporter: Liyin Tang >Priority: Minor > > When a MR job is reading from SnapshotInputFormat, it needs to calculate the > splits based on the block locations in order to get best locality. However, > this process may take a long time for large snapshots. > In some setup, the computing layer, Spark, Hive or Presto could run out side > of HBase cluster. In these scenarios, the block locality doesn't matter. > Therefore, it will be great to have an option to skip calculating the block > locations for every job. That will super useful for the Hive/Presto/Spark > connectors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018165#comment-14018165 ] Liyin Tang commented on HBASE-8763: --- Hi, I am out of office since 9/1/2012 to 9/16/2012 and I cannot access to this email. In urgent case, please forward your email to liyint...@gmail.com Thanks a lot Liyin [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: HBase MVCC LogSeqId Combined.pdf, hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763-v4.patch, hbase-8763-v5.1.patch, hbase-8763-v5.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10893) Bug in Fast Diff Delta Block Encoding
[ https://issues.apache.org/jira/browse/HBASE-10893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13957194#comment-13957194 ] Liyin Tang commented on HBASE-10893: That's a quite serious bug and [~manukranthk] has already had a fix to this. Bug in Fast Diff Delta Block Encoding - Key: HBASE-10893 URL: https://issues.apache.org/jira/browse/HBASE-10893 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.89-fb Reporter: Manukranth Kolloju Assignee: Manukranth Kolloju Fix For: 0.89-fb The following 2 key values if encoded and decoded, produce wrong results: byte[] row = Bytes.toBytes(abcd); byte[] family = new byte[] { 'f' }; byte[] qualifier0 = new byte[] { 'b' }; byte[] qualifier1 = new byte[] { 'c' }; byte[] value0 = new byte[] { 0x01 }; byte[] value1 = new byte[] { 0x00 }; kvList.add(new KeyValue(row, family, qualifier0, 0, Type.Put, value0)); kvList.add(new KeyValue(row, family, qualifier1, 0, Type.Put, value1)); while using Fast Diff Delta Block encoding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10784) [89-fb] Avoid the unnecessary memory copy for RowCol and DeleteColumn Bloom filters
Liyin Tang created HBASE-10784: -- Summary: [89-fb] Avoid the unnecessary memory copy for RowCol and DeleteColumn Bloom filters Key: HBASE-10784 URL: https://issues.apache.org/jira/browse/HBASE-10784 Project: HBase Issue Type: Improvement Reporter: Liyin Tang For adding/querying rowcol and deleteColumn BF, there are multiple unnecessary memory copy operations. This jira is to address the concern and avoid creating these dummy bloom keys as much as possible. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934565#comment-13934565 ] Liyin Tang commented on HBASE-10191: Just curious, has anyone experienced any imbalance memory allocation among the NUMA nodes when allocating large off heap arena ? Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Issue Comment Deleted] (HBASE-10191) Move large arena storage off heap
[ https://issues.apache.org/jira/browse/HBASE-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-10191: --- Comment: was deleted (was: Just curious, has anyone experienced any imbalance memory allocation among the NUMA nodes when allocating large off heap arena ? ) Move large arena storage off heap - Key: HBASE-10191 URL: https://issues.apache.org/jira/browse/HBASE-10191 Project: HBase Issue Type: Umbrella Reporter: Andrew Purtell Even with the improved G1 GC in Java 7, Java processes that want to address large regions of memory while also providing low high-percentile latencies continue to be challenged. Fundamentally, a Java server process that has high data throughput and also tight latency SLAs will be stymied by the fact that the JVM does not provide a fully concurrent collector. There is simply not enough throughput to copy data during GC under safepoint (all application threads suspended) within available time bounds. This is increasingly an issue for HBase users operating under dual pressures: 1. tight response SLAs, 2. the increasing amount of RAM available in commodity server configurations, because GC load is roughly proportional to heap size. We can address this using parallel strategies. We should talk with the Java platform developer community about the possibility of a fully concurrent collector appearing in OpenJDK somehow. Set aside the question of if this is too little too late, if one becomes available the benefit will be immediate though subject to qualification for production, and transparent in terms of code changes. However in the meantime we need an answer for Java versions already in production. This requires we move the large arena allocations off heap, those being the blockcache and memstore. On other JIRAs recently there has been related discussion about combining the blockcache and memstore (HBASE-9399) and on flushing memstore into blockcache (HBASE-5311), which is related work. We should build off heap allocation for memstore and blockcache, perhaps a unified pool for both, and plumb through zero copy direct access to these allocations (via direct buffers) through the read and write I/O paths. This may require the construction of classes that provide object views over data contained within direct buffers. This is something else we could talk with the Java platform developer community about - it could be possible to provide language level object views over off heap memory, on heap objects could hold references to objects backed by off heap memory but not vice versa, maybe facilitated by new intrinsics in Unsafe. Again we need an answer for today also. We should investigate what existing libraries may be available in this regard. Key will be avoiding marshalling/unmarshalling costs. At most we should be copying primitives out of the direct buffers to register or stack locations until finally copying data to construct protobuf Messages. A related issue there is HBASE-9794, which proposes scatter-gather access to KeyValues when constructing RPC messages. We should see how far we can get with that and also zero copy construction of protobuf Messages backed by direct buffer allocations. Some amount of native code may be required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10659) [89-fb] Optimize the threading model in HBase write path
[ https://issues.apache.org/jira/browse/HBASE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921860#comment-13921860 ] Liyin Tang commented on HBASE-10659: One of key motivations is to avoid handler waiting on the sync thread. This model requires more IPC handler threads to reach the maximum QPS. I will share more detail numbers once it is ready. [89-fb] Optimize the threading model in HBase write path Key: HBASE-10659 URL: https://issues.apache.org/jira/browse/HBASE-10659 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Recently, we have done multiple prototypes to optimize the HBase (0.89)write path. And based on the simulator results, the following model is able to achieve much higher overall throughput with less threads. IPC Writer Threads Pool: IPC handler threads will prepare all Put requests, and append the WALEdit, as one transaction, into a concurrent collection with a read lock. And then just return; HLogSyncer Thread: Each HLogSyncer thread is corresponding to one HLog stream. It swaps the concurrent collection with a write lock, and then iterate over all the elements in the previous concurrent collection, generate the sequence id for each transaction, and write to HLog. After the HLog sync is done, append these transactions as a batch into a blocking queue. Memstore Update Thread: The memstore update thread will poll the blocking queue and update the memstore for each transaction by using the sequence id as MVCC. Once the memstore update is done, dispatch to the responder thread pool to return to the client. Responder Thread Pool: Responder thread pool will return the RPC call in parallel. We are still evaluating this model and will share more results/numbers once it is ready. But really appreciate any comments in advance ! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10659) [89-fb] Optimize the threading model in HBase write path
[ https://issues.apache.org/jira/browse/HBASE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919818#comment-13919818 ] Liyin Tang commented on HBASE-10659: 1) IPC writer thread will do all the sanity check as a preparation, such as figure out which Region and whether it is enabled. 2) IPC writer thread will handoff the Put request, and then start to process next IPC request. It won't block or wait for the current Put request to finish. The responder thread will finally return the call to the clients. 3) One HLogSyncer per WAL, and each HLogSyncer has its own concurrent collections to swap between. 4) I don' fully understand your last question. Since the HLogSyncer thread has already one the sequencing for each transaction, memstore-update-thread could just reuse the same sequence id as MVCC. The basic motivation of this new write path is to reduce the thread interleaving and synchronizations in the critical write path as much as possible. [89-fb] Optimize the threading model in HBase write path Key: HBASE-10659 URL: https://issues.apache.org/jira/browse/HBASE-10659 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Recently, we have done multiple prototypes to optimize the HBase (0.89)write path. And based on the simulator results, the following model is able to achieve much higher overall throughput with less threads. IPC Writer Threads Pool: IPC handler threads will prepare all Put requests, and append the WALEdit, as one transaction, into a concurrent collection with a read lock. And then just return; HLogSyncer Thread: Each HLogSyncer thread is corresponding to one HLog stream. It swaps the concurrent collection with a write lock, and then iterate over all the elements in the previous concurrent collection, generate the sequence id for each transaction, and write to HLog. After the HLog sync is done, append these transactions as a batch into a blocking queue. Memstore Update Thread: The memstore update thread will poll the blocking queue and update the memstore for each transaction by using the sequence id as MVCC. Once the memstore update is done, dispatch to the responder thread pool to return to the client. Responder Thread Pool: Responder thread pool will return the RPC call in parallel. We are still evaluating this model and will share more results/numbers once it is ready. But really appreciate any comments in advance ! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10659) [89-fb] Optimize the threading model in HBase write path
Liyin Tang created HBASE-10659: -- Summary: [89-fb] Optimize the threading model in HBase write path Key: HBASE-10659 URL: https://issues.apache.org/jira/browse/HBASE-10659 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Recently, we have done multiple prototypes to optimize the HBase (0.89)write path. And based on the simulator results, the following model is able to achieve much higher overall throughput with less threads. IPC Writer Threads Pool: IPC handler threads will prepare all Put requests, and append the WALEdit, as one transaction, into a concurrent collection with a read lock. And then just return; HLogSyncer Thread: Each HLogSyncer thread is corresponding to one HLog stream. It swaps the concurrent collection with a write lock, and then iterate over all the elements in the previous concurrent collection, generate the sequence id for each transaction, and write to HLog. After the HLog sync is done, append these transactions as a batch into a blocking queue. Memstore Update Thread: The memstore update thread will poll the blocking queue and update the memstore for each transaction by using the sequence id as MVCC. Once the memstore update is done, dispatch to the responder thread pool to return to the client. Responder Thread Pool: Responder thread pool will return the RPC call in parallel. We are still evaluating this model and will share more results/numbers once it is ready. But really appreciate any comments in advance ! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10659) [89-fb] Optimize the threading model in HBase write path
[ https://issues.apache.org/jira/browse/HBASE-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919033#comment-13919033 ] Liyin Tang commented on HBASE-10659: 1) Since updating memstore is much faster than HLog syncing, one memstore-update-thread seems to be sufficient. Or we can make it configurable as each HLogSyncer thread will have a corresponding memstore-update-thread. 2) The HLogSyncer thread will batch multiple transactions, as a group commit, from different IPC writer threads, and then sync this group commit into HLog stream. And then, the memstore-update-thread will take this group commit and update the corresponding memstore in (sequence id) order. [89-fb] Optimize the threading model in HBase write path Key: HBASE-10659 URL: https://issues.apache.org/jira/browse/HBASE-10659 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Recently, we have done multiple prototypes to optimize the HBase (0.89)write path. And based on the simulator results, the following model is able to achieve much higher overall throughput with less threads. IPC Writer Threads Pool: IPC handler threads will prepare all Put requests, and append the WALEdit, as one transaction, into a concurrent collection with a read lock. And then just return; HLogSyncer Thread: Each HLogSyncer thread is corresponding to one HLog stream. It swaps the concurrent collection with a write lock, and then iterate over all the elements in the previous concurrent collection, generate the sequence id for each transaction, and write to HLog. After the HLog sync is done, append these transactions as a batch into a blocking queue. Memstore Update Thread: The memstore update thread will poll the blocking queue and update the memstore for each transaction by using the sequence id as MVCC. Once the memstore update is done, dispatch to the responder thread pool to return to the client. Responder Thread Pool: Responder thread pool will return the RPC call in parallel. We are still evaluating this model and will share more results/numbers once it is ready. But really appreciate any comments in advance ! -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10578) For the same row key, the KV in the newest StoreFile should be returned
[ https://issues.apache.org/jira/browse/HBASE-10578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907555#comment-13907555 ] Liyin Tang commented on HBASE-10578: Nice finding ! For the same row key, the KV in the newest StoreFile should be returned --- Key: HBASE-10578 URL: https://issues.apache.org/jira/browse/HBASE-10578 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 0.89-fb, 0.98.1 Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Minor Fix For: 0.89-fb Attachments: HBASE-10578.patch When multiple scanners have the same KV, HBase should pick the newest one. i.e. pick the KV from the store file with the largest seq id. In the KeyValueHeap generalizedSeek implementation, we seem to prefer the current scanner over the scanners in the heap -- THIS IS WRONG. The diff adds a unit test to make sure that bulk loads correctly. And fixes the issue. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.
Liyin Tang created HBASE-10502: -- Summary: [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel. Key: HBASE-10502 URL: https://issues.apache.org/jira/browse/HBASE-10502 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Fix For: 0.89-fb ParallelScanner is a utility class for the HBase client to perform multiple scan requests in parallel. It requires all the scan requests having the same caching size for the simplicity purpose. This class provides 3 very basic functionalities: * The initialize function will Initialize all the ResultScanners by calling {@link HTable#getScanner(Scan)} in parallel for each scan request. * The next function will call the corresponding {@link ResultScanner#next(int numRows)} from each scan request in parallel, and then return all the results together as a list. Also, if result list is empty, it indicates there is no data left for all the scanners and the user can call {@link #close()} afterwards. * The close function will close all the scanners and shutdown the thread pool. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.
[ https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898128#comment-13898128 ] Liyin Tang commented on HBASE-10502: By skimming though HBASE-9272, the semantics seem to be a little different. In this case, the client actually wants to construct multiple scan requests, while HBASE-9272 is to perform a single scan request in parallel. [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel. Key: HBASE-10502 URL: https://issues.apache.org/jira/browse/HBASE-10502 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Fix For: 0.89-fb ParallelScanner is a utility class for the HBase client to perform multiple scan requests in parallel. It requires all the scan requests having the same caching size for the simplicity purpose. This class provides 3 very basic functionalities: * The initialize function will Initialize all the ResultScanners by calling {@link HTable#getScanner(Scan)} in parallel for each scan request. * The next function will call the corresponding {@link ResultScanner#next(int numRows)} from each scan request in parallel, and then return all the results together as a list. Also, if result list is empty, it indicates there is no data left for all the scanners and the user can call {@link #close()} afterwards. * The close function will close all the scanners and shutdown the thread pool. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.
[ https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898132#comment-13898132 ] Liyin Tang commented on HBASE-10502: Actually HBase-9272 + HBase10502 is quite effective to optimize Join queries. Assuming a join query such as Table A joins Table B based on row key / some prefix, then HBase-9272 is useful to issue the initial scan in parallel to retrieve all the join keys, and then based on join keys, multiple scan queries for Table B can be constructed and be submitted in parallel by HBase10502. [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel. Key: HBASE-10502 URL: https://issues.apache.org/jira/browse/HBASE-10502 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Fix For: 0.89-fb ParallelScanner is a utility class for the HBase client to perform multiple scan requests in parallel. It requires all the scan requests having the same caching size for the simplicity purpose. This class provides 3 very basic functionalities: * The initialize function will Initialize all the ResultScanners by calling {@link HTable#getScanner(Scan)} in parallel for each scan request. * The next function will call the corresponding {@link ResultScanner#next(int numRows)} from each scan request in parallel, and then return all the results together as a list. Also, if result list is empty, it indicates there is no data left for all the scanners and the user can call {@link #close()} afterwards. * The close function will close all the scanners and shutdown the thread pool. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10502) [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel.
[ https://issues.apache.org/jira/browse/HBASE-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898138#comment-13898138 ] Liyin Tang commented on HBASE-10502: In addition, the API of HBASE-10502 seems to more flexible (to me). Because if there is a single scan request, spanning multiple region boundaries, then hbase client is always able to split this scan request into multiple region-local scan requests, and then submit to HBASE-10502 for parallel execution. [89-fb] ParallelScanner: a client utility to perform multiple scan requests in parallel. Key: HBASE-10502 URL: https://issues.apache.org/jira/browse/HBASE-10502 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Fix For: 0.89-fb ParallelScanner is a utility class for the HBase client to perform multiple scan requests in parallel. It requires all the scan requests having the same caching size for the simplicity purpose. This class provides 3 very basic functionalities: * The initialize function will Initialize all the ResultScanners by calling {@link HTable#getScanner(Scan)} in parallel for each scan request. * The next function will call the corresponding {@link ResultScanner#next(int numRows)} from each scan request in parallel, and then return all the results together as a list. Also, if result list is empty, it indicates there is no data left for all the scanners and the user can call {@link #close()} afterwards. * The close function will close all the scanners and shutdown the thread pool. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10296) Replace ZK with a consensus lib(paxos,zab or raft) running within master processes to provide better master failover performance and state consistency
[ https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13884979#comment-13884979 ] Liyin Tang commented on HBASE-10296: Speaking of RAFT implementation, we, the FB hbase team, are very close to open source a Raft implementation as a library. And there are multiple potentials to integrate Raft protocol into HBase/HDFS software stack. Replace ZK with a consensus lib(paxos,zab or raft) running within master processes to provide better master failover performance and state consistency -- Key: HBASE-10296 URL: https://issues.apache.org/jira/browse/HBASE-10296 Project: HBase Issue Type: Brainstorming Components: master, Region Assignment, regionserver Reporter: Feng Honghua Currently master relies on ZK to elect active master, monitor liveness and store almost all of its states, such as region states, table info, replication info and so on. And zk also plays as a channel for master-regionserver communication(such as in region assigning) and client-regionserver communication(such as replication state/behavior change). But zk as a communication channel is fragile due to its one-time watch and asynchronous notification mechanism which together can leads to missed events(hence missed messages), for example the master must rely on the state transition logic's idempotence to maintain the region assigning state machine's correctness, actually almost all of the most tricky inconsistency issues can trace back their root cause to the fragility of zk as a communication channel. Replace zk with paxos running within master processes have following benefits: 1. better master failover performance: all master, either the active or the standby ones, have the same latest states in memory(except lag ones but which can eventually catch up later on). whenever the active master dies, the newly elected active master can immediately play its role without such failover work as building its in-memory states by consulting meta-table and zk. 2. better state consistency: master's in-memory states are the only truth about the system,which can eliminate inconsistency from the very beginning. and though the states are contained by all masters, paxos guarantees they are identical at any time. 3. more direct and simple communication pattern: client changes state by sending requests to master, master and regionserver talk directly to each other by sending request and response...all don't bother to using a third-party storage like zk which can introduce more uncertainty, worse latency and more complexity. 4. zk can only be used as liveness monitoring for determining if a regionserver is dead, and later on we can eliminate zk totally when we build heartbeat between master and regionserver. I know this might looks like a very crazy re-architect, but it deserves deep thinking and serious discussion for it, right? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-7404) Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE
[ https://issues.apache.org/jira/browse/HBASE-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879091#comment-13879091 ] Liyin Tang commented on HBASE-7404: --- Liang, just curious, what's the top contributor for the p99 latency in your case ? Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE -- Key: HBASE-7404 URL: https://issues.apache.org/jira/browse/HBASE-7404 Project: HBase Issue Type: New Feature Affects Versions: 0.94.3 Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.95.0 Attachments: 7404-0.94-fixed-lines.txt, 7404-trunk-v10.patch, 7404-trunk-v11.patch, 7404-trunk-v12.patch, 7404-trunk-v13.patch, 7404-trunk-v13.txt, 7404-trunk-v14.patch, BucketCache.pdf, HBASE-7404-backport-0.94.patch, Introduction of Bucket Cache.pdf, hbase-7404-94v2.patch, hbase-7404-trunkv2.patch, hbase-7404-trunkv9.patch First, thanks @neil from Fusion-IO share the source code. Usage: 1.Use bucket cache as main memory cache, configured as the following: –hbase.bucketcache.ioengine heap (or offheap if using offheap memory to cache block ) –hbase.bucketcache.size 0.4 (size for bucket cache, 0.4 is a percentage of max heap size) 2.Use bucket cache as a secondary cache, configured as the following: –hbase.bucketcache.ioengine file:/disk1/hbase/cache.data(The file path where to store the block data) –hbase.bucketcache.size 1024 (size for bucket cache, unit is MB, so 1024 means 1GB) –hbase.bucketcache.combinedcache.enabled false (default value being true) See more configurations from org.apache.hadoop.hbase.io.hfile.CacheConfig and org.apache.hadoop.hbase.io.hfile.bucket.BucketCache What's Bucket Cache? It could greatly decrease CMS and heap fragment by GC It support a large cache space for High Read Performance by using high speed disk like Fusion-io 1.An implementation of block cache like LruBlockCache 2.Self manage blocks' storage position through Bucket Allocator 3.The cached blocks could be stored in the memory or file system 4.Bucket Cache could be used as a mainly block cache(see CombinedBlockCache), combined with LruBlockCache to decrease CMS and fragment by GC. 5.BucketCache also could be used as a secondary cache(e.g. using Fusionio to store block) to enlarge cache space How about SlabCache? We have studied and test SlabCache first, but the result is bad, because: 1.SlabCache use SingleSizeCache, its use ratio of memory is low because kinds of block size, especially using DataBlockEncoding 2.SlabCache is uesd in DoubleBlockCache, block is cached both in SlabCache and LruBlockCache, put the block to LruBlockCache again if hit in SlabCache , it causes CMS and heap fragment don't get any better 3.Direct heap performance is not good as heap, and maybe cause OOM, so we recommend using heap engine See more in the attachment and in the patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10342) RowKey Prefix Bloom Filter
[ https://issues.apache.org/jira/browse/HBASE-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872414#comment-13872414 ] Liyin Tang commented on HBASE-10342: Interesting If the most significant part of the row key is evenly distributed across the row key space, usually we don't need to salt the table, right ? RowKey Prefix Bloom Filter -- Key: HBASE-10342 URL: https://issues.apache.org/jira/browse/HBASE-10342 Project: HBase Issue Type: New Feature Reporter: Liyin Tang When designing HBase schema for some use cases, it is quite common to combine multiple information within the RowKey. For instance, assuming that rowkey is constructed as md5(id1) + id1 + id2, and user wants to scan all the rowkeys which starting by id1. In such case, the rowkey bloom filter is able to cut more unnecessary seeks during the scan. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-7509) Enable RS to query a secondary datanode in parallel, if the primary takes too long
[ https://issues.apache.org/jira/browse/HBASE-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872509#comment-13872509 ] Liyin Tang commented on HBASE-7509: --- I guess Amitanand probably have a diff for 89-fb, but not for trunk yet. But thanks Liang for following up ! Enable RS to query a secondary datanode in parallel, if the primary takes too long -- Key: HBASE-7509 URL: https://issues.apache.org/jira/browse/HBASE-7509 Project: HBase Issue Type: Improvement Reporter: Amitanand Aiyer Assignee: Liang Xie Priority: Critical Attachments: quorumDiffs.tgz -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10360) [89-fb] Expose the HRegionLocations for each HTable in an efficient way
Liyin Tang created HBASE-10360: -- Summary: [89-fb] Expose the HRegionLocations for each HTable in an efficient way Key: HBASE-10360 URL: https://issues.apache.org/jira/browse/HBASE-10360 Project: HBase Issue Type: Improvement Reporter: Liyin Tang HTable.getHRegionInfo() will return all the RegionServer address for each HRegion by scanning the META table. Actually, the HConnectionManger could cache these data and refresh the client location cache directly. Also, HTable could expose another API to return these cached HRegionLocation directly without scanning the META table. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-7509) Enable RS to query a secondary datanode in parallel, if the primary takes too long
[ https://issues.apache.org/jira/browse/HBASE-7509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873019#comment-13873019 ] Liyin Tang commented on HBASE-7509: --- [~xieliang007], actually I think it makes more sense to assign the JIRA to you, considering you are the one actively working on diff towards the HBase-trunk. Enable RS to query a secondary datanode in parallel, if the primary takes too long -- Key: HBASE-7509 URL: https://issues.apache.org/jira/browse/HBASE-7509 Project: HBase Issue Type: Improvement Reporter: Amitanand Aiyer Assignee: Amitanand Aiyer Priority: Critical Attachments: quorumDiffs.tgz -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10342) RowKey Prefix Bloom Filter
Liyin Tang created HBASE-10342: -- Summary: RowKey Prefix Bloom Filter Key: HBASE-10342 URL: https://issues.apache.org/jira/browse/HBASE-10342 Project: HBase Issue Type: New Feature Reporter: Liyin Tang When designing HBase schema for some use cases, it is quite common to combine multiple information within the RowKey. For instance, assuming that rowkey is constructed as md5(id1) + id1 + id2, and user wants to scan all the rowkeys which starting at id1 . In such case, the rowkey bloom filter is able to cut more unnecessary seeks during the scan. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10342) RowKey Prefix Bloom Filter
[ https://issues.apache.org/jira/browse/HBASE-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871684#comment-13871684 ] Liyin Tang commented on HBASE-10342: This feature shall also benefit the Salted Tables as well. RowKey Prefix Bloom Filter -- Key: HBASE-10342 URL: https://issues.apache.org/jira/browse/HBASE-10342 Project: HBase Issue Type: New Feature Reporter: Liyin Tang When designing HBase schema for some use cases, it is quite common to combine multiple information within the RowKey. For instance, assuming that rowkey is constructed as md5(id1) + id1 + id2, and user wants to scan all the rowkeys which starting at id1 . In such case, the rowkey bloom filter is able to cut more unnecessary seeks during the scan. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10343) Write the last sequence id into the HLog during the RegionOpen time
Liyin Tang created HBASE-10343: -- Summary: Write the last sequence id into the HLog during the RegionOpen time Key: HBASE-10343 URL: https://issues.apache.org/jira/browse/HBASE-10343 Project: HBase Issue Type: Improvement Reporter: Liyin Tang HLog based async replication has a challenging to guarantee the in-order delivery when the Region is moving from one HLog stream to another HLog stream. One approach is to keep the last_sequence_id in the new HLog stream when opening the Region. So the replication framework is able to catch upto the last_sequence_id from the previous HLog stream, before replicating any new transactions through the new HLog stream. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10275) [89-fb] Guarantee the sequenceID in each Region is strictly monotonic increasing
[ https://issues.apache.org/jira/browse/HBASE-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871719#comment-13871719 ] Liyin Tang commented on HBASE-10275: HBASE-10343 might resolve this issue in much easier way. [89-fb] Guarantee the sequenceID in each Region is strictly monotonic increasing Key: HBASE-10275 URL: https://issues.apache.org/jira/browse/HBASE-10275 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Liyin Tang [HBASE-8741] has implemented the per-region sequence ID. It would be even better to guarantee that the sequencing is strictly monotonic increasing so that HLog-Based Async Replication is able to delivery transactions in order in the case of region movements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10342) RowKey Prefix Bloom Filter
[ https://issues.apache.org/jira/browse/HBASE-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871725#comment-13871725 ] Liyin Tang commented on HBASE-10342: Yes, a prefix-hash memstore will help this case as well ! It is definitely worth benchmarking. RowKey Prefix Bloom Filter -- Key: HBASE-10342 URL: https://issues.apache.org/jira/browse/HBASE-10342 Project: HBase Issue Type: New Feature Reporter: Liyin Tang When designing HBase schema for some use cases, it is quite common to combine multiple information within the RowKey. For instance, assuming that rowkey is constructed as md5(id1) + id1 + id2, and user wants to scan all the rowkeys which starting at id1 . In such case, the rowkey bloom filter is able to cut more unnecessary seeks during the scan. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10342) RowKey Prefix Bloom Filter
[ https://issues.apache.org/jira/browse/HBASE-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-10342: --- Description: When designing HBase schema for some use cases, it is quite common to combine multiple information within the RowKey. For instance, assuming that rowkey is constructed as md5(id1) + id1 + id2, and user wants to scan all the rowkeys which starting by id1. In such case, the rowkey bloom filter is able to cut more unnecessary seeks during the scan. (was: When designing HBase schema for some use cases, it is quite common to combine multiple information within the RowKey. For instance, assuming that rowkey is constructed as md5(id1) + id1 + id2, and user wants to scan all the rowkeys which starting at id1 . In such case, the rowkey bloom filter is able to cut more unnecessary seeks during the scan.) RowKey Prefix Bloom Filter -- Key: HBASE-10342 URL: https://issues.apache.org/jira/browse/HBASE-10342 Project: HBase Issue Type: New Feature Reporter: Liyin Tang When designing HBase schema for some use cases, it is quite common to combine multiple information within the RowKey. For instance, assuming that rowkey is constructed as md5(id1) + id1 + id2, and user wants to scan all the rowkeys which starting by id1. In such case, the rowkey bloom filter is able to cut more unnecessary seeks during the scan. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8741) Scope sequenceid to the region rather than regionserver (WAS: Mutations on Regions in recovery mode might have same sequenceIDs)
[ https://issues.apache.org/jira/browse/HBASE-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861765#comment-13861765 ] Liyin Tang commented on HBASE-8741: --- It seems like the diff is using the following hashmap for the regionName to last sequence mapping. The java doc mentioned It works in our use case as we use {@link HRegionInfo#getEncodedNameAsBytes()} as keys. For a given region, it always returns the same array. private Mapbyte[], Long latestSequenceNums = new HashMapbyte[], Long(); However, if a region has been re-opened before the HLog rolls, then there will be 2 entries for the same region in this mapping because the hashcode for these 2 byte[] will be different, right ? Scope sequenceid to the region rather than regionserver (WAS: Mutations on Regions in recovery mode might have same sequenceIDs) Key: HBASE-8741 URL: https://issues.apache.org/jira/browse/HBASE-8741 Project: HBase Issue Type: Bug Components: MTTR Affects Versions: 0.95.1 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.98.0 Attachments: HBASE-8741-trunk-v6.1-rebased.patch, HBASE-8741-trunk-v6.2.1.patch, HBASE-8741-trunk-v6.2.2.patch, HBASE-8741-trunk-v6.2.2.patch, HBASE-8741-trunk-v6.3.patch, HBASE-8741-trunk-v6.4.patch, HBASE-8741-trunk-v6.patch, HBASE-8741-v0.patch, HBASE-8741-v2.patch, HBASE-8741-v3.patch, HBASE-8741-v4-again.patch, HBASE-8741-v4-again.patch, HBASE-8741-v4.patch, HBASE-8741-v5-again.patch, HBASE-8741-v5.patch Currently, when opening a region, we find the maximum sequence ID from all its HFiles and then set the LogSequenceId of the log (in case the later is at a small value). This works good in recovered.edits case as we are not writing to the region until we have replayed all of its previous edits. With distributed log replay, if we want to enable writes while a region is under recovery, we need to make sure that the logSequenceId maximum logSequenceId of the old regionserver. Otherwise, we might have a situation where new edits have same (or smaller) sequenceIds. We can store region level information in the WALTrailer, than this scenario could be avoided by: a) reading the trailer of the last completed file, i.e., last wal file which has a trailer and, b) completely reading the last wal file (this file would not have the trailer, so it needs to be read completely). In future, if we switch to multi wal file, we could read the trailer for all completed WAL files, and reading the remaining incomplete files. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10275) [89-fb] Guarantee the sequenceID in each Region is strictly monotonic increasing
Liyin Tang created HBASE-10275: -- Summary: [89-fb] Guarantee the sequenceID in each Region is strictly monotonic increasing Key: HBASE-10275 URL: https://issues.apache.org/jira/browse/HBASE-10275 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Liyin Tang [HBASE-8741] has implemented the per-region sequence ID. It would be even better to guarantee that the sequencing is strictly monotonic increasing so that HLog-Based Async Replication is able to delivery transactions in order in the case of region movements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10275) [89-fb] Guarantee the sequenceID in each Region is strictly monotonic increasing
[ https://issues.apache.org/jira/browse/HBASE-10275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862208#comment-13862208 ] Liyin Tang commented on HBASE-10275: The problem you have described is exactly what we want to resolve. Basically if the sequenceID for each region is strictly monotonic increasing, then in the case of a region moving from A to B, the replication stream in B would know the gap/lag for that region in the previous replication stream A. As you mentioned but slightly different: The fix is to guarantee the old hlog entries of a region from the previous region server been fully replicated, before starting to replicate this region from a new region server. [89-fb] Guarantee the sequenceID in each Region is strictly monotonic increasing Key: HBASE-10275 URL: https://issues.apache.org/jira/browse/HBASE-10275 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Liyin Tang [HBASE-8741] has implemented the per-region sequence ID. It would be even better to guarantee that the sequencing is strictly monotonic increasing so that HLog-Based Async Replication is able to delivery transactions in order in the case of region movements. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8741) Scope sequenceid to the region rather than regionserver (WAS: Mutations on Regions in recovery mode might have same sequenceIDs)
[ https://issues.apache.org/jira/browse/HBASE-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862218#comment-13862218 ] Liyin Tang commented on HBASE-8741: --- Good to know :) Not sure whether it is worth amending the above implication in the java doc. Basically, the latestSequenceNums might contain duplicated keys for the same region. In 89-fb, we just use the ConcurrentSkipListMap, just in case this map might be reused for other purpose. Anyway, thanks for the explanation ! Nice feature indeed ! Scope sequenceid to the region rather than regionserver (WAS: Mutations on Regions in recovery mode might have same sequenceIDs) Key: HBASE-8741 URL: https://issues.apache.org/jira/browse/HBASE-8741 Project: HBase Issue Type: Bug Components: MTTR Affects Versions: 0.95.1 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.98.0 Attachments: HBASE-8741-trunk-v6.1-rebased.patch, HBASE-8741-trunk-v6.2.1.patch, HBASE-8741-trunk-v6.2.2.patch, HBASE-8741-trunk-v6.2.2.patch, HBASE-8741-trunk-v6.3.patch, HBASE-8741-trunk-v6.4.patch, HBASE-8741-trunk-v6.patch, HBASE-8741-v0.patch, HBASE-8741-v2.patch, HBASE-8741-v3.patch, HBASE-8741-v4-again.patch, HBASE-8741-v4-again.patch, HBASE-8741-v4.patch, HBASE-8741-v5-again.patch, HBASE-8741-v5.patch Currently, when opening a region, we find the maximum sequence ID from all its HFiles and then set the LogSequenceId of the log (in case the later is at a small value). This works good in recovered.edits case as we are not writing to the region until we have replayed all of its previous edits. With distributed log replay, if we want to enable writes while a region is under recovery, we need to make sure that the logSequenceId maximum logSequenceId of the old regionserver. Otherwise, we might have a situation where new edits have same (or smaller) sequenceIds. We can store region level information in the WALTrailer, than this scenario could be avoided by: a) reading the trailer of the last completed file, i.e., last wal file which has a trailer and, b) completely reading the last wal file (this file would not have the trailer, so it needs to be read completely). In future, if we switch to multi wal file, we could read the trailer for all completed WAL files, and reading the remaining incomplete files. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857630#comment-13857630 ] Liyin Tang commented on HBASE-8763: --- I totally vote for combining the MVCC and SeqID. Furthermore, it will be even more straightforward if the SeqID does not shared across all the Regions. Ideally, each region shall have its own monotonously increasing seq id. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857680#comment-13857680 ] Liyin Tang commented on HBASE-8763: --- Thanks for jira ! If SeqID has already been per-region basis, and we want to combine the MVCC, then how do we want to handle the group commit across multiple regions ? [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857820#comment-13857820 ] Liyin Tang commented on HBASE-8763: --- [~jeffreyz], I see. Thanks for the clarification and it makes sense to me now ! [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10083) [89-fb] Better error handling for the compound bloom filter
Liyin Tang created HBASE-10083: -- Summary: [89-fb] Better error handling for the compound bloom filter Key: HBASE-10083 URL: https://issues.apache.org/jira/browse/HBASE-10083 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Liyin Tang Assignee: Liyin Tang When RegionServer failed to load a bloom block from HDFS due to any timeout or other reasons, it threw out the exception and disable the entire bloom filter for this HFile. This behavior does not make too much sense, especially for the compound bloom filter. Instead of disabling the bloom filter for the entire file, it could just return a potentially false positive result (true) and keep the bloom filter available. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-10009) Fix potential OOM exception in HTableMultiplexer
Liyin Tang created HBASE-10009: -- Summary: Fix potential OOM exception in HTableMultiplexer Key: HBASE-10009 URL: https://issues.apache.org/jira/browse/HBASE-10009 Project: HBase Issue Type: Bug Affects Versions: 0.89-fb Reporter: Liyin Tang Assignee: Manukranth Kolloju Priority: Minor HTableMultiplexer is our thread safe non-blocking api. HTableMultiplexer.getHTable is supposed to cache HTable instance, but it fails to do so if it is called on a table with same name but a different reference. Fix this behavior and add a unit test case in the existing TestHtableMultiplexer class. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9969) Improve KeyValueHeap using loser tree
[ https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825519#comment-13825519 ] Liyin Tang commented on HBASE-9969: --- That's a very promising idea ! Will take a closer look. Nice work [~stepinto] ! Improve KeyValueHeap using loser tree - Key: HBASE-9969 URL: https://issues.apache.org/jira/browse/HBASE-9969 Project: HBase Issue Type: Improvement Components: Performance, regionserver Reporter: Chao Shi Assignee: Chao Shi Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 9969-0.94.txt, hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, kvheap-benchmark.png, kvheap-benchmark.txt LoserTree is the better data structure than binary heap. It saves half of the comparisons on each next(), though the time complexity is on O(logN). Currently A scan or get will go through two KeyValueHeaps, one is merging KVs read from multiple HFiles in a single store, the other is merging results from multiple stores. This patch should improve the both cases whenever CPU is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811). All of the optimization work is done in KeyValueHeap and does not change its public interfaces. The new code looks more cleaner and simpler to understand. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9102) HFile block pre-loading for large sequential scan
[ https://issues.apache.org/jira/browse/HBASE-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726635#comment-13726635 ] Liyin Tang commented on HBASE-9102: --- Chao, You are right that the pre-load will run in a rate/limit fashion to make sure it won't pollute the block cache substantially. The pre-loading targets on the large sequential scan case. The client is able to enable/disable on each request basis. HFile block pre-loading for large sequential scan - Key: HBASE-9102 URL: https://issues.apache.org/jira/browse/HBASE-9102 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Liyin Tang Assignee: Liyin Tang The current HBase scan model cannot take full advantage of the aggrediate disk throughput, especially for the large sequential scan cases. And for the large sequential scan, it is easy to predict what the next block to read in advance so that it can pre-load and decompress/decoded these data blocks from HDFS into block cache right before the current read point. Therefore, this jira is to optimized the large sequential scan performance by pre-loading the HFile blocks into the block cache in a stream fashion so that the scan query can read from the cache directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9102) HFile block pre-loading for large sequential scan
Liyin Tang created HBASE-9102: - Summary: HFile block pre-loading for large sequential scan Key: HBASE-9102 URL: https://issues.apache.org/jira/browse/HBASE-9102 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Liyin Tang Assignee: Liyin Tang The current HBase scan model cannot take full advantage of the aggrediate disk throughput, especially for the large sequential scan cases. And for the large sequential scan, it is easy to predict what the next block to read in advance so that it can pre-load and decompress/decoded these data blocks from HDFS into block cache right before the current read point. Therefore, this jira is to optimized the large sequential scan performance by pre-loading the HFile blocks into the block cache in a stream fashion so that the scan query can read from the cache directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7266) [89-fb] Using pread for non-compaction read request
[ https://issues.apache.org/jira/browse/HBASE-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725443#comment-13725443 ] Liyin Tang commented on HBASE-7266: --- Chao,we have switched all the read operation to pread in the 89-fb branch. There are 2 followup tasks for the pread. 1) The DFSClient maintains a connection pool instead of creating new connection for each pread operation. 2) HBase will actively pre-load the next several blocks in a stream fashion for large sequential scans (HBASE-9102) [89-fb] Using pread for non-compaction read request --- Key: HBASE-7266 URL: https://issues.apache.org/jira/browse/HBASE-7266 Project: HBase Issue Type: Improvement Reporter: Liyin Tang There are 2 kinds of read operations in HBase: pread and seek+read. Pread, positional read, is stateless and create a new connection between the DFSClient and DataNode for each operation. While seek+read is to seek to a specific postion and prefetch blocks from data nodes. The benefit of seek+read is that it will cache the prefetch result but the downside is it is stateful and needs to synchronized. So far, both compaction and scan are using seek+read, which caused some resource contention. So using the pread for the scan request can avoid the resource contention. In addition, the region server is able to do the prefetch for the scan request (HBASE-6874) so that it won't be necessary to let the DFSClient to prefetch the data any more. I will run through the scan benchmark (with no block cache) with verify the performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9102) HFile block pre-loading for large sequential scan
[ https://issues.apache.org/jira/browse/HBASE-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725515#comment-13725515 ] Liyin Tang commented on HBASE-9102: --- It is true that OS cached the compressed/encoded blocks and the DFSClient non-pread operation is also able to pre-load all the bytes up to that DFS block. And this feature is to pre-load (decompress/decoded) these data blocks in additional to the OS cache/disk read-ahead. Also the scan prefetch is currently implemented in the RegionScanner level. I think it is a good idea to implement some prefetch logic in the HBase client as well. HFile block pre-loading for large sequential scan - Key: HBASE-9102 URL: https://issues.apache.org/jira/browse/HBASE-9102 Project: HBase Issue Type: Improvement Affects Versions: 0.89-fb Reporter: Liyin Tang Assignee: Liyin Tang The current HBase scan model cannot take full advantage of the aggrediate disk throughput, especially for the large sequential scan cases. And for the large sequential scan, it is easy to predict what the next block to read in advance so that it can pre-load and decompress/decoded these data blocks from HDFS into block cache right before the current read point. Therefore, this jira is to optimized the large sequential scan performance by pre-loading the HFile blocks into the block cache in a stream fashion so that the scan query can read from the cache directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6930) [89-fb] Avoid acquiring the same row lock repeatedly
[ https://issues.apache.org/jira/browse/HBASE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-6930: -- Attachment: (was: D5841.2.patch) [89-fb] Avoid acquiring the same row lock repeatedly Key: HBASE-6930 URL: https://issues.apache.org/jira/browse/HBASE-6930 Project: HBase Issue Type: Bug Reporter: Liyin Tang Attachments: HBASE-6930.diff When processing the multiPut, multiMutations or multiDelete operations, each IPC handler thread tries to acquire a lock for each row key in these batches. If there are duplicated row keys in these batches, previously the IPC handler thread will repeatedly acquire the same row key again and again. So the optimization is to sort each batch operation based on the row key in the client side, and skip acquiring the same row lock repeatedly in the server side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6930) [89-fb] Avoid acquiring the same row lock repeatedly
[ https://issues.apache.org/jira/browse/HBASE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-6930: -- Attachment: HBASE-6930.diff [89-fb] Avoid acquiring the same row lock repeatedly Key: HBASE-6930 URL: https://issues.apache.org/jira/browse/HBASE-6930 Project: HBase Issue Type: Bug Reporter: Liyin Tang Attachments: HBASE-6930.diff When processing the multiPut, multiMutations or multiDelete operations, each IPC handler thread tries to acquire a lock for each row key in these batches. If there are duplicated row keys in these batches, previously the IPC handler thread will repeatedly acquire the same row key again and again. So the optimization is to sort each batch operation based on the row key in the client side, and skip acquiring the same row lock repeatedly in the server side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6930) [89-fb] Avoid acquiring the same row lock repeatedly
[ https://issues.apache.org/jira/browse/HBASE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694916#comment-13694916 ] Liyin Tang commented on HBASE-6930: --- The patch here seems to be attached to a wrong jira. Sorry about the confusion. I have re-attached the path here. [89-fb] Avoid acquiring the same row lock repeatedly Key: HBASE-6930 URL: https://issues.apache.org/jira/browse/HBASE-6930 Project: HBase Issue Type: Bug Reporter: Liyin Tang Attachments: HBASE-6930.diff When processing the multiPut, multiMutations or multiDelete operations, each IPC handler thread tries to acquire a lock for each row key in these batches. If there are duplicated row keys in these batches, previously the IPC handler thread will repeatedly acquire the same row key again and again. So the optimization is to sort each batch operation based on the row key in the client side, and skip acquiring the same row lock repeatedly in the server side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6930) [89-fb] Avoid acquiring the same row lock repeatedly
[ https://issues.apache.org/jira/browse/HBASE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-6930: -- Attachment: (was: D5841.1.patch) [89-fb] Avoid acquiring the same row lock repeatedly Key: HBASE-6930 URL: https://issues.apache.org/jira/browse/HBASE-6930 Project: HBase Issue Type: Bug Reporter: Liyin Tang Attachments: HBASE-6930.diff When processing the multiPut, multiMutations or multiDelete operations, each IPC handler thread tries to acquire a lock for each row key in these batches. If there are duplicated row keys in these batches, previously the IPC handler thread will repeatedly acquire the same row key again and again. So the optimization is to sort each batch operation based on the row key in the client side, and skip acquiring the same row lock repeatedly in the server side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8806) Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for duplicate rows.
[ https://issues.apache.org/jira/browse/HBASE-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694487#comment-13694487 ] Liyin Tang commented on HBASE-8806: --- Is this issue similar as HBASE-6930 ? We solved the problem by sorting the rows for each multiput batch in the client side. Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for duplicate rows. Key: HBASE-8806 URL: https://issues.apache.org/jira/browse/HBASE-8806 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.5 Reporter: rahul gidwani Fix For: 0.95.2, 0.94.10 Attachments: HBASE-8806-0.94.10.patch, HBASE-8806-0.94.10-v2.patch If we already have the lock in the doMiniBatchMutation we don't need to re-acquire it. The solution would be to keep a cache of the rowKeys already locked for a miniBatchMutation and If we already have the rowKey in the cache, we don't repeatedly try and acquire the lock. A fix to this problem would be to keep a set of rows we already locked and not try to acquire the lock for these rows. We have tested this fix in our production environment and has improved replication performance quite a bit. We saw a replication batch go from 3+ minutes to less than 10 seconds for batches with duplicate row keys. {code} static final int ACQUIRE_LOCK_COUNT = 0; @Test public void testRedundantRowKeys() throws Exception { final int batchSize = 10; String tableName = getClass().getSimpleName(); Configuration conf = HBaseConfiguration.create(); conf.setClass(HConstants.REGION_IMPL, MockHRegion.class, HeapSize.class); MockHRegion region = (MockHRegion) TestHRegion.initHRegion(Bytes.toBytes(tableName), tableName, conf, Bytes.toBytes(a)); ListPairMutation, Integer someBatch = Lists.newArrayList(); int i = 0; while (i batchSize) { if (i % 2 == 0) { someBatch.add(new PairMutation, Integer(new Put(Bytes.toBytes(0)), null)); } else { someBatch.add(new PairMutation, Integer(new Put(Bytes.toBytes(1)), null)); } i++; } long startTime = System.currentTimeMillis(); region.batchMutate(someBatch.toArray(new Pair[0])); long endTime = System.currentTimeMillis(); long duration = endTime - startTime; System.out.println(duration: + duration + ms); assertEquals(2, ACQUIRE_LOCK_COUNT); } @Override public Integer getLock(Integer lockid, byte[] row, boolean waitForLock) throws IOException { ACQUIRE_LOCK_COUNT++; return super.getLock(lockid, row, waitForLock); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7055) port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)
[ https://issues.apache.org/jira/browse/HBASE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-7055: -- Attachment: Tier Based Compaction Settings.pdf The tier based compaction settings. port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes) -- Key: HBASE-7055 URL: https://issues.apache.org/jira/browse/HBASE-7055 Project: HBase Issue Type: Task Components: Compaction Affects Versions: 0.95.2 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.95.2 Attachments: HBASE-6371-squashed.patch, HBASE-6371-v2-squashed.patch, HBASE-6371-v3-refactor-only-squashed.patch, HBASE-6371-v4-refactor-only-squashed.patch, HBASE-6371-v5-refactor-only-squashed.patch, HBASE-7055-v0.patch, HBASE-7055-v1.patch, HBASE-7055-v2.patch, HBASE-7055-v3.patch, HBASE-7055-v4.patch, HBASE-7055-v5.patch, HBASE-7055-v6.patch, HBASE-7055-v7.patch, HBASE-7055-v7.patch, Tier Based Compaction Settings.pdf See HBASE-6371 for details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7055) port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)
[ https://issues.apache.org/jira/browse/HBASE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687476#comment-13687476 ] Liyin Tang commented on HBASE-7055: --- Unfortunately, we haven't config this tier based compaction for our applications, either. port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes) -- Key: HBASE-7055 URL: https://issues.apache.org/jira/browse/HBASE-7055 Project: HBase Issue Type: Task Components: Compaction Affects Versions: 0.95.2 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.95.2 Attachments: HBASE-6371-squashed.patch, HBASE-6371-v2-squashed.patch, HBASE-6371-v3-refactor-only-squashed.patch, HBASE-6371-v4-refactor-only-squashed.patch, HBASE-6371-v5-refactor-only-squashed.patch, HBASE-7055-v0.patch, HBASE-7055-v1.patch, HBASE-7055-v2.patch, HBASE-7055-v3.patch, HBASE-7055-v4.patch, HBASE-7055-v5.patch, HBASE-7055-v6.patch, HBASE-7055-v7.patch, HBASE-7055-v7.patch, Tier Based Compaction Settings.pdf See HBASE-6371 for details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5263) Preserving cached data on compactions through cache-on-write
[ https://issues.apache.org/jira/browse/HBASE-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang reassigned HBASE-5263: - Assignee: Rishit Shroff (was: Mikhail Bautin) Preserving cached data on compactions through cache-on-write Key: HBASE-5263 URL: https://issues.apache.org/jira/browse/HBASE-5263 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Rishit Shroff Priority: Minor We are tackling HBASE-3976 and HBASE-5230 to make sure we don't trash the block cache on compactions if cache-on-write is enabled. However, it would be ideal to reduce the effect compactions have on the cached data. For every block we are writing for a compacted file we can decide whether it needs to be cached based on whether the original blocks containing the same data were already in cache. More precisely, for every HFile reader in a compaction we can maintain a boolean flag saying whether the current key-value came from a disk IO or the block cache. In the HFile writer for the compaction's output we can maintain a flag that is set if any of the key-values in the block being written came from a cached block, use that flag at the end of a block to decide whether to cache-on-write the block, and reset the flag to false on a block boundary. If such an inclusive approach would still trash the cache, we could restrict the total number of blocks to be cached per an output HFile, switch to an and logic instead of or logic for deciding whether to cache an output file block, or only cache a certain percentage of output file blocks that contain some of the previously cached data. Thanks to Nicolas for this elegant online algorithm idea! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4433) avoid extra next (potentially a seek) if done with column/row
[ https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13589763#comment-13589763 ] Liyin Tang commented on HBASE-4433: --- Hi Lars, the jira Kannan mentioned is [HBASE-5987] HFileBlockIndex improvements. By looking ahead at the next indexed key, HBase internal reader knows whether to keep scanning the current DataBlock or look up the index. This feature avoids additional index lookup overhead when multiple requests are sequentially scanning the HFile data block. Actually, we have a list of jiras in our FB internal HBase release. Do you know a proper place we could share these work with more hbase-dev ? avoid extra next (potentially a seek) if done with column/row - Key: HBASE-4433 URL: https://issues.apache.org/jira/browse/HBASE-4433 Project: HBase Issue Type: Improvement Reporter: Kannan Muthukkaruppan Assignee: Kannan Muthukkaruppan Fix For: 0.92.0 [Noticed this in 89, but quite likely true of trunk as well.] When we are done with the requested column(s) the code still does an extra next() call before it realizes that it is actually done. This extra next() call could potentially result in an unnecessary extra block load. This is likely to be especially bad for CFs where the KVs are large blobs where each KV may be occupying a block of its own. So the next() can often load a new unrelated block unnecessarily. -- For the simple case of reading say the top-most column in a row in a single file, where each column (KV) was say a block of its own-- it seems that we are reading 3 blocks, instead of 1 block! I am working on a simple patch and with that the number of seeks is down to 2. [There is still an extra seek left. I think there were two levels of extra/unnecessary next() we were doing without actually confirming that the next was needed. One at the StoreScanner/ScanQueryMatcher level which this diff avoids. I think the other is at hfs.next() (at the storefile scanner level) that's happening whenever a HFile scanner servers out a data-- and perhaps that's the additional seek that we need to avoid. But I want to tackle this optimization first as the two issues seem unrelated.] -- The basic idea of the patch I am working on/testing is as follows. The ExplicitColumnTracker currently returns INCLUDE to the ScanQueryMatcher if the KV needs to be included and then if done, only in the the next call it returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases when ExplicitColumnTracker knows it is done with a particular column/row, the patch attempts to combine the INCLUDE code and done hint into a single match code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7444) [89-fb] Update the default user name in MiniHBaseCluster
Liyin Tang created HBASE-7444: - Summary: [89-fb] Update the default user name in MiniHBaseCluster Key: HBASE-7444 URL: https://issues.apache.org/jira/browse/HBASE-7444 Project: HBase Issue Type: Test Reporter: Liyin Tang Priority: Minor Currently we are using $username.hrs.$index as default user name in MiniHBaseCluster, which actually is not a legal user name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7416) [89-fb] Tier compaction with fixed boundary option
Liyin Tang created HBASE-7416: - Summary: [89-fb] Tier compaction with fixed boundary option Key: HBASE-7416 URL: https://issues.apache.org/jira/browse/HBASE-7416 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Assignee: Chen Jin Currently, in tier compaction the aged-based algorithm considers HFile's age in disk relative to the current time, thus the tiers are actually shifting along the time. In order to best use our prior information about how applications consume the data, it needs another feature to perceive the tiers relative to a fixed time point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7416) [89-fb] Tier compaction with fixed boundary option
[ https://issues.apache.org/jira/browse/HBASE-7416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-7416: -- Assignee: (was: Chen Jin) [89-fb] Tier compaction with fixed boundary option --- Key: HBASE-7416 URL: https://issues.apache.org/jira/browse/HBASE-7416 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Currently, in tier compaction the aged-based algorithm considers HFile's age in disk relative to the current time, thus the tiers are actually shifting along the time. In order to best use our prior information about how applications consume the data, it needs another feature to perceive the tiers relative to a fixed time point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5776) HTableMultiplexer
[ https://issues.apache.org/jira/browse/HBASE-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang reassigned HBASE-5776: - Assignee: binlijin (was: Liyin Tang) HTableMultiplexer -- Key: HBASE-5776 URL: https://issues.apache.org/jira/browse/HBASE-5776 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: binlijin Attachments: ASF.LICENSE.NOT.GRANTED--D2775.1.patch, ASF.LICENSE.NOT.GRANTED--D2775.1.patch, ASF.LICENSE.NOT.GRANTED--D2775.2.patch, ASF.LICENSE.NOT.GRANTED--D2775.2.patch, ASF.LICENSE.NOT.GRANTED--D2775.3.patch, ASF.LICENSE.NOT.GRANTED--D2775.4.patch, ASF.LICENSE.NOT.GRANTED--D2775.5.patch, HBASE-5776-trunk.patch, HBASE-5776-trunk-V2.patch There is a known issue in HBase client that single slow/dead region server could slow down the multiput operations across all the region servers. So the HBase client will be as slow as the slowest region server in the cluster. To solve this problem, HTableMultiplexer will separate the multiput submitting threads with the flush threads, which means the multiput operation will be a nonblocking operation. The submitting thread will shard all the puts into different queues based on its destination region server and return immediately. The flush threads will flush these puts from each queue to its destination region server. Currently the HTableMultiplexer only supports the put operation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7348) [89-fb] Add some statistics from DFSClient to RegionServerMetrics
Liyin Tang created HBASE-7348: - Summary: [89-fb] Add some statistics from DFSClient to RegionServerMetrics Key: HBASE-7348 URL: https://issues.apache.org/jira/browse/HBASE-7348 Project: HBase Issue Type: Improvement Reporter: Liyin Tang DFSClient actually collected a number of useful statistics such as bytesLocalRead, bytesLocalRackRead and so on. So this diff is going to merge these metrics into the RegionServerMetrics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7275) [89-fb] Fixing some minor bugs in 89-fb branch
Liyin Tang created HBASE-7275: - Summary: [89-fb] Fixing some minor bugs in 89-fb branch Key: HBASE-7275 URL: https://issues.apache.org/jira/browse/HBASE-7275 Project: HBase Issue Type: Bug Reporter: Liyin Tang Priority: Minor [89-fb] Fixing some minor bugs in 89-fb branch based on the findBugs report -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7267) [89-fb] Only create the dummy hfile for the compaction if necessary.
[ https://issues.apache.org/jira/browse/HBASE-7267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-7267: -- Description: In HBASE-6059, it introduced a new behavior that the compaction would create the HFileWriter no mater whether there is any key/value as the output or not. This new behavior actually is conflicts with HBASE-5199 (Delete out of TTL store files before compaction selection) so that compacting the expired hfiles would generate one more expired hfiles. Actually we only needs to create the dummy hfile IFF the maxSequenceID among the compaction candidates is equal to the maxSequenceID among all the on-disk hfiles. was: In HBASE-6059, it introduced a new behavior that the compaction would create the HFileWriter no mater whether there is any key/value as the output or not. This new behavior actually is conflicts with HBASE-5199 (Delete out of TTL store files before compaction selection) so that compacting the expired hfiles would generate one more expired hfiles. Actually we only needs to create the dummy hfile IFF the maxSequenceID among the compaction candidates is equal to the the maxSequenceID among all the on-disk hfiles. [89-fb] Only create the dummy hfile for the compaction if necessary. Key: HBASE-7267 URL: https://issues.apache.org/jira/browse/HBASE-7267 Project: HBase Issue Type: Bug Reporter: Liyin Tang In HBASE-6059, it introduced a new behavior that the compaction would create the HFileWriter no mater whether there is any key/value as the output or not. This new behavior actually is conflicts with HBASE-5199 (Delete out of TTL store files before compaction selection) so that compacting the expired hfiles would generate one more expired hfiles. Actually we only needs to create the dummy hfile IFF the maxSequenceID among the compaction candidates is equal to the maxSequenceID among all the on-disk hfiles. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7276) [89-fb] Optimize the read/write requests metrics in the RegionServer level
Liyin Tang created HBASE-7276: - Summary: [89-fb] Optimize the read/write requests metrics in the RegionServer level Key: HBASE-7276 URL: https://issues.apache.org/jira/browse/HBASE-7276 Project: HBase Issue Type: Improvement Reporter: Liyin Tang In HBase, each RegionServer will host a set of Regions and both of them keep track of the read/write requests metrics. So total number of read/write requests among all the Regions shall be equal to the total number from the RegionServer. We shall optimize the code to remove the redundant metrics in the RegionServer level, and merge the Region level metrics into the RegionServer level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7266) [89-fb] Using pread for non-compaction read request
Liyin Tang created HBASE-7266: - Summary: [89-fb] Using pread for non-compaction read request Key: HBASE-7266 URL: https://issues.apache.org/jira/browse/HBASE-7266 Project: HBase Issue Type: Improvement Reporter: Liyin Tang There are 2 kinds of read operations in HBase: pread and seek+read. Pread, positional read, is stateless and create a new connection between the DFSClient and DataNode for each operation. While seek+read is to seek to a specific postion and prefetch blocks from data nodes. The benefit of seek+read is that it will cache the prefetch result but the downside is it is stateful and needs to synchronized. So far, both compaction and scan are using pread, which caused some resource contention. So using the pread for the scan request can avoid the resource contention. In addition, the region server is able to do the prefetch for the scan request (HBASE-6874) so that it won't be necessary to let the DFSClient to prefetch the data any more. I will run through the scan benchmark (with no block cache) with verify the performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7266) [89-fb] Using pread for non-compaction read request
[ https://issues.apache.org/jira/browse/HBASE-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-7266: -- Description: There are 2 kinds of read operations in HBase: pread and seek+read. Pread, positional read, is stateless and create a new connection between the DFSClient and DataNode for each operation. While seek+read is to seek to a specific postion and prefetch blocks from data nodes. The benefit of seek+read is that it will cache the prefetch result but the downside is it is stateful and needs to synchronized. So far, both compaction and scan are using seek+read, which caused some resource contention. So using the pread for the scan request can avoid the resource contention. In addition, the region server is able to do the prefetch for the scan request (HBASE-6874) so that it won't be necessary to let the DFSClient to prefetch the data any more. I will run through the scan benchmark (with no block cache) with verify the performance. was: There are 2 kinds of read operations in HBase: pread and seek+read. Pread, positional read, is stateless and create a new connection between the DFSClient and DataNode for each operation. While seek+read is to seek to a specific postion and prefetch blocks from data nodes. The benefit of seek+read is that it will cache the prefetch result but the downside is it is stateful and needs to synchronized. So far, both compaction and scan are using pread, which caused some resource contention. So using the pread for the scan request can avoid the resource contention. In addition, the region server is able to do the prefetch for the scan request (HBASE-6874) so that it won't be necessary to let the DFSClient to prefetch the data any more. I will run through the scan benchmark (with no block cache) with verify the performance. [89-fb] Using pread for non-compaction read request --- Key: HBASE-7266 URL: https://issues.apache.org/jira/browse/HBASE-7266 Project: HBase Issue Type: Improvement Reporter: Liyin Tang There are 2 kinds of read operations in HBase: pread and seek+read. Pread, positional read, is stateless and create a new connection between the DFSClient and DataNode for each operation. While seek+read is to seek to a specific postion and prefetch blocks from data nodes. The benefit of seek+read is that it will cache the prefetch result but the downside is it is stateful and needs to synchronized. So far, both compaction and scan are using seek+read, which caused some resource contention. So using the pread for the scan request can avoid the resource contention. In addition, the region server is able to do the prefetch for the scan request (HBASE-6874) so that it won't be necessary to let the DFSClient to prefetch the data any more. I will run through the scan benchmark (with no block cache) with verify the performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7267) [89-fb] Only create the dummy hfile for the compaction if necessary.
Liyin Tang created HBASE-7267: - Summary: [89-fb] Only create the dummy hfile for the compaction if necessary. Key: HBASE-7267 URL: https://issues.apache.org/jira/browse/HBASE-7267 Project: HBase Issue Type: Bug Reporter: Liyin Tang In HBASE-6059, it introduced a new behavior that the compaction would create the HFileWriter no mater whether there is any key/value as the output or not. This new behavior actually is conflicts with HBASE-5199 (Delete out of TTL store files before compaction selection) so that compacting the expired hfiles would generate one more expired hfiles. Actually we only needs to create the dummy hfile iff the maxSequenceID among the compaction candidates is equal the the maxSequenceID among all the on-disk hfiles. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7267) [89-fb] Only create the dummy hfile for the compaction if necessary.
[ https://issues.apache.org/jira/browse/HBASE-7267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-7267: -- Description: In HBASE-6059, it introduced a new behavior that the compaction would create the HFileWriter no mater whether there is any key/value as the output or not. This new behavior actually is conflicts with HBASE-5199 (Delete out of TTL store files before compaction selection) so that compacting the expired hfiles would generate one more expired hfiles. Actually we only needs to create the dummy hfile IFF the maxSequenceID among the compaction candidates is equal to the the maxSequenceID among all the on-disk hfiles. was: In HBASE-6059, it introduced a new behavior that the compaction would create the HFileWriter no mater whether there is any key/value as the output or not. This new behavior actually is conflicts with HBASE-5199 (Delete out of TTL store files before compaction selection) so that compacting the expired hfiles would generate one more expired hfiles. Actually we only needs to create the dummy hfile iff the maxSequenceID among the compaction candidates is equal the the maxSequenceID among all the on-disk hfiles. [89-fb] Only create the dummy hfile for the compaction if necessary. Key: HBASE-7267 URL: https://issues.apache.org/jira/browse/HBASE-7267 Project: HBase Issue Type: Bug Reporter: Liyin Tang In HBASE-6059, it introduced a new behavior that the compaction would create the HFileWriter no mater whether there is any key/value as the output or not. This new behavior actually is conflicts with HBASE-5199 (Delete out of TTL store files before compaction selection) so that compacting the expired hfiles would generate one more expired hfiles. Actually we only needs to create the dummy hfile IFF the maxSequenceID among the compaction candidates is equal to the the maxSequenceID among all the on-disk hfiles. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7164) [89-fb] Using HFileOutputFormat as MapOutputFormat
Liyin Tang created HBASE-7164: - Summary: [89-fb] Using HFileOutputFormat as MapOutputFormat Key: HBASE-7164 URL: https://issues.apache.org/jira/browse/HBASE-7164 Project: HBase Issue Type: New Feature Reporter: Liyin Tang Priority: Minor Add one more option in TableMapReduceUtil to initialize a Map only job which takes TableInputFormat as MapInputFormat and HFileOutputFormat as MapOutputFormat. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7106) [89-fb] Fix the NPE in unit tests for JDK7
[ https://issues.apache.org/jira/browse/HBASE-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493401#comment-13493401 ] Liyin Tang commented on HBASE-7106: --- Gustavo Anatoly: I didn't fully understand your questions :) The pom change is orthogonal with the code change. Jimmy, The semantics of NULL column qualifier is equal to that of the EMPYT_BYTE_ARRAY column qualifier. However, the fix in HBASE-6206 will skip the NULL qualifier. -set.add(qualifier); +if (qualifier != null) { + set.add(qualifier); +} = I think the correct fix shall be: if (qualifier != null) { set.add(qualifier); } else { set.add(HConstants.EMPTY_BYTE_ARRAY); } [89-fb] Fix the NPE in unit tests for JDK7 -- Key: HBASE-7106 URL: https://issues.apache.org/jira/browse/HBASE-7106 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Priority: Trivial In JDK7, it will throw out NPE if put a NULL into a TreeSet. And in the unit tests, user can add a NULL as qualifier into the family map for GET or SCAN. So we shall do the followings: 1) Make sure the semantics of NULL column qualifier is equal to that of the EMPYT_BYTE_ARRAY column qualifier. 2) An easy fix is to use the EMPYT_BYTE_ARRAY qualifier to replace NULL qualifier in the family map for the GET or SCAN objects, and everything else shall be backward compatible. 3) Add a jdk option in the pom.xml (Assuming user installed the fb packaged jdk) eg: mvn test -Dtest=TestFromClientSide -Pjdk7 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7106) [89-fb] Fix the NPE in unit tests for JDK7
[ https://issues.apache.org/jira/browse/HBASE-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13493478#comment-13493478 ] Liyin Tang commented on HBASE-7106: --- Gustavo Anatoly, sure ! [89-fb] Fix the NPE in unit tests for JDK7 -- Key: HBASE-7106 URL: https://issues.apache.org/jira/browse/HBASE-7106 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Priority: Trivial In JDK7, it will throw out NPE if put a NULL into a TreeSet. And in the unit tests, user can add a NULL as qualifier into the family map for GET or SCAN. So we shall do the followings: 1) Make sure the semantics of NULL column qualifier is equal to that of the EMPYT_BYTE_ARRAY column qualifier. 2) An easy fix is to use the EMPYT_BYTE_ARRAY qualifier to replace NULL qualifier in the family map for the GET or SCAN objects, and everything else shall be backward compatible. 3) Add a jdk option in the pom.xml (Assuming user installed the fb packaged jdk) eg: mvn test -Dtest=TestFromClientSide -Pjdk7 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6371) [89-fb] Tier based compaction
[ https://issues.apache.org/jira/browse/HBASE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-6371: -- Attachment: (was: HBase_Tier_Base_Compaction.pdf) [89-fb] Tier based compaction - Key: HBASE-6371 URL: https://issues.apache.org/jira/browse/HBASE-6371 Project: HBase Issue Type: Improvement Reporter: Akashnil Assignee: Liyin Tang Labels: noob Attachments: HBASE-6371-089fb-commit.patch, HBase_Tier_Base_Compaction.pdf Currently, the compaction selection is not very flexible and is not sensitive to the hotness of the data. Very old data is likely to be accessed less, and very recent data is likely to be in the block cache. Both of these considerations make it inefficient to compact these files as aggressively as other files. In some use-cases, the access-pattern is particularly obvious even though there is no way to control the compaction algorithm in those cases. In the new compaction selection algorithm, we plan to divide the candidate files into different levels according to oldness of the data that is present in those files. For each level, parameters like compaction ratio, minimum number of store-files in each compaction may be different. Number of levels, time-ranges, and parameters for each level will be configurable online on a per-column family basis. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6371) [89-fb] Tier based compaction
[ https://issues.apache.org/jira/browse/HBASE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-6371: -- Attachment: HBase_Tier_Base_Compaction.pdf [89-fb] Tier based compaction - Key: HBASE-6371 URL: https://issues.apache.org/jira/browse/HBASE-6371 Project: HBase Issue Type: Improvement Reporter: Akashnil Assignee: Liyin Tang Labels: noob Attachments: HBASE-6371-089fb-commit.patch, HBase_Tier_Base_Compaction.pdf Currently, the compaction selection is not very flexible and is not sensitive to the hotness of the data. Very old data is likely to be accessed less, and very recent data is likely to be in the block cache. Both of these considerations make it inefficient to compact these files as aggressively as other files. In some use-cases, the access-pattern is particularly obvious even though there is no way to control the compaction algorithm in those cases. In the new compaction selection algorithm, we plan to divide the candidate files into different levels according to oldness of the data that is present in those files. For each level, parameters like compaction ratio, minimum number of store-files in each compaction may be different. Number of levels, time-ranges, and parameters for each level will be configurable online on a per-column family basis. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7106) [89-fb] Fix the NPE in unit tests for JDK7
Liyin Tang created HBASE-7106: - Summary: [89-fb] Fix the NPE in unit tests for JDK7 Key: HBASE-7106 URL: https://issues.apache.org/jira/browse/HBASE-7106 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Priority: Trivial In JDK7, it will throw out NPE if put a NULL into a TreeSet. So the easy fix is to skip putting the NULL qualifier into the family map for the GET and SCAN objects, and everything else shall be backward compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7106) [89-fb] Fix the NPE in unit tests for JDK7
[ https://issues.apache.org/jira/browse/HBASE-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-7106: -- Description: In JDK7, it will throw out NPE if put a NULL into a TreeSet. And in the unit tests, user can add a NULL as qualifier into the family map for GET or SCAN. So we shall do the followings: 1) Make sure the semantics of NULL column qualifier is equal to that of the EMPYT_BYTE_ARRAY column qualifier. 2) An easy fix is to use the EMPYT_BYTE_ARRAY qualifier to replace NULL qualifier in the family map for the GET or SCAN objects, and everything else shall be backward compatible. 3) Add a jdk option in the pom.xml (Assuming user install the fb packaged jdk) eg: mvn test -Dtest=TestFromClientSide -Pjdk7 was:In JDK7, it will throw out NPE if put a NULL into a TreeSet. So the easy fix is to skip putting the NULL qualifier into the family map for the GET and SCAN objects, and everything else shall be backward compatible. [89-fb] Fix the NPE in unit tests for JDK7 -- Key: HBASE-7106 URL: https://issues.apache.org/jira/browse/HBASE-7106 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Priority: Trivial In JDK7, it will throw out NPE if put a NULL into a TreeSet. And in the unit tests, user can add a NULL as qualifier into the family map for GET or SCAN. So we shall do the followings: 1) Make sure the semantics of NULL column qualifier is equal to that of the EMPYT_BYTE_ARRAY column qualifier. 2) An easy fix is to use the EMPYT_BYTE_ARRAY qualifier to replace NULL qualifier in the family map for the GET or SCAN objects, and everything else shall be backward compatible. 3) Add a jdk option in the pom.xml (Assuming user install the fb packaged jdk) eg: mvn test -Dtest=TestFromClientSide -Pjdk7 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7106) [89-fb] Fix the NPE in unit tests for JDK7
[ https://issues.apache.org/jira/browse/HBASE-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-7106: -- Description: In JDK7, it will throw out NPE if put a NULL into a TreeSet. And in the unit tests, user can add a NULL as qualifier into the family map for GET or SCAN. So we shall do the followings: 1) Make sure the semantics of NULL column qualifier is equal to that of the EMPYT_BYTE_ARRAY column qualifier. 2) An easy fix is to use the EMPYT_BYTE_ARRAY qualifier to replace NULL qualifier in the family map for the GET or SCAN objects, and everything else shall be backward compatible. 3) Add a jdk option in the pom.xml (Assuming user installed the fb packaged jdk) eg: mvn test -Dtest=TestFromClientSide -Pjdk7 was: In JDK7, it will throw out NPE if put a NULL into a TreeSet. And in the unit tests, user can add a NULL as qualifier into the family map for GET or SCAN. So we shall do the followings: 1) Make sure the semantics of NULL column qualifier is equal to that of the EMPYT_BYTE_ARRAY column qualifier. 2) An easy fix is to use the EMPYT_BYTE_ARRAY qualifier to replace NULL qualifier in the family map for the GET or SCAN objects, and everything else shall be backward compatible. 3) Add a jdk option in the pom.xml (Assuming user install the fb packaged jdk) eg: mvn test -Dtest=TestFromClientSide -Pjdk7 [89-fb] Fix the NPE in unit tests for JDK7 -- Key: HBASE-7106 URL: https://issues.apache.org/jira/browse/HBASE-7106 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Priority: Trivial In JDK7, it will throw out NPE if put a NULL into a TreeSet. And in the unit tests, user can add a NULL as qualifier into the family map for GET or SCAN. So we shall do the followings: 1) Make sure the semantics of NULL column qualifier is equal to that of the EMPYT_BYTE_ARRAY column qualifier. 2) An easy fix is to use the EMPYT_BYTE_ARRAY qualifier to replace NULL qualifier in the family map for the GET or SCAN objects, and everything else shall be backward compatible. 3) Add a jdk option in the pom.xml (Assuming user installed the fb packaged jdk) eg: mvn test -Dtest=TestFromClientSide -Pjdk7 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6371) [89-fb] Tier based compaction
[ https://issues.apache.org/jira/browse/HBASE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-6371: -- Attachment: HBase_Tier_Base_Compaction.pdf The design doc for HBase Tier-based Compaction from Akashnil. [89-fb] Tier based compaction - Key: HBASE-6371 URL: https://issues.apache.org/jira/browse/HBASE-6371 Project: HBase Issue Type: Improvement Reporter: Akashnil Assignee: Liyin Tang Labels: noob Attachments: HBASE-6371-089fb-commit.patch, HBase_Tier_Base_Compaction.pdf Currently, the compaction selection is not very flexible and is not sensitive to the hotness of the data. Very old data is likely to be accessed less, and very recent data is likely to be in the block cache. Both of these considerations make it inefficient to compact these files as aggressively as other files. In some use-cases, the access-pattern is particularly obvious even though there is no way to control the compaction algorithm in those cases. In the new compaction selection algorithm, we plan to divide the candidate files into different levels according to oldness of the data that is present in those files. For each level, parameters like compaction ratio, minimum number of store-files in each compaction may be different. Number of levels, time-ranges, and parameters for each level will be configurable online on a per-column family basis. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6371) [89-fb] Level based compaction
[ https://issues.apache.org/jira/browse/HBASE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang reassigned HBASE-6371: - Assignee: Liyin Tang (was: Akashnil) [89-fb] Level based compaction -- Key: HBASE-6371 URL: https://issues.apache.org/jira/browse/HBASE-6371 Project: HBase Issue Type: Improvement Reporter: Akashnil Assignee: Liyin Tang Labels: noob Currently, the compaction selection is not very flexible and is not sensitive to the hotness of the data. Very old data is likely to be accessed less, and very recent data is likely to be in the block cache. Both of these considerations make it inefficient to compact these files as aggressively as other files. In some use-cases, the access-pattern is particularly obvious even though there is no way to control the compaction algorithm in those cases. In the new compaction selection algorithm, we plan to divide the candidate files into different levels according to oldness of the data that is present in those files. For each level, parameters like compaction ratio, minimum number of store-files in each compaction may be different. Number of levels, time-ranges, and parameters for each level will be configurable online on a per-column family basis. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6371) [89-fb] Tier based compaction
[ https://issues.apache.org/jira/browse/HBASE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-6371: -- Summary: [89-fb] Tier based compaction (was: [89-fb] Level based compaction) [89-fb] Tier based compaction - Key: HBASE-6371 URL: https://issues.apache.org/jira/browse/HBASE-6371 Project: HBase Issue Type: Improvement Reporter: Akashnil Assignee: Liyin Tang Labels: noob Currently, the compaction selection is not very flexible and is not sensitive to the hotness of the data. Very old data is likely to be accessed less, and very recent data is likely to be in the block cache. Both of these considerations make it inefficient to compact these files as aggressively as other files. In some use-cases, the access-pattern is particularly obvious even though there is no way to control the compaction algorithm in those cases. In the new compaction selection algorithm, we plan to divide the candidate files into different levels according to oldness of the data that is present in those files. For each level, parameters like compaction ratio, minimum number of store-files in each compaction may be different. Number of levels, time-ranges, and parameters for each level will be configurable online on a per-column family basis. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6371) [89-fb] Tier based compaction
[ https://issues.apache.org/jira/browse/HBASE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475952#comment-13475952 ] Liyin Tang commented on HBASE-6371: --- As Nicolas suggested, rename the jira as tier based compaction. [89-fb] Tier based compaction - Key: HBASE-6371 URL: https://issues.apache.org/jira/browse/HBASE-6371 Project: HBase Issue Type: Improvement Reporter: Akashnil Assignee: Liyin Tang Labels: noob Currently, the compaction selection is not very flexible and is not sensitive to the hotness of the data. Very old data is likely to be accessed less, and very recent data is likely to be in the block cache. Both of these considerations make it inefficient to compact these files as aggressively as other files. In some use-cases, the access-pattern is particularly obvious even though there is no way to control the compaction algorithm in those cases. In the new compaction selection algorithm, we plan to divide the candidate files into different levels according to oldness of the data that is present in those files. For each level, parameters like compaction ratio, minimum number of store-files in each compaction may be different. Number of levels, time-ranges, and parameters for each level will be configurable online on a per-column family basis. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6968) Several HBase write perf improvement
Liyin Tang created HBASE-6968: - Summary: Several HBase write perf improvement Key: HBASE-6968 URL: https://issues.apache.org/jira/browse/HBASE-6968 Project: HBase Issue Type: Improvement Reporter: Liyin Tang There are two improvements in this jira: 1) Change 2 hotspot synchronized functions into double locking pattern. So it shall remove the synchronization overhead in the normal case. 2) Avoid creating HBaseConfiguraiton object for each HLog. Every time when creating a HBaseConfiguraiton object, it would parse the xml configuration files from disk, which is not cheap operation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6968) Several HBase write perf improvement
[ https://issues.apache.org/jira/browse/HBASE-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-6968: -- Description: Here are 2 hbase write performance improvements recently found out. 1) Avoid creating HBaseConfiguraiton object for each HLog. Every time when creating a HBaseConfiguraiton object, it would parse the xml configuration files from disk, which is not cheap operation. In HLog.java: orig: {code:title=HLog.java} newWriter = createWriter(fs, newPath, HBaseConfiguration.create(conf)); {code} new: {code} newWriter = createWriter(fs, newPath, conf); {code} 2) Change 2 hotspot synchronized functions into double locking pattern. So it shall remove the synchronization overhead in the normal case. orig: {code:title=HBaseRpcMetrics.java} public synchronized void inc(String name, int amt) { MetricsTimeVaryingRate m = get(name); if (m == null) { m = create(name); } m.inc(amt); } {code} new: {code} public void inc(String name, int amt) { MetricsTimeVaryingRate m = get(name); if (m == null) { synchronized (this) { if ((m = get(name)) == null) { m = create(name); } } } m.inc(amt); } {code} = orig: {code:title=MemStoreFlusher.java} public synchronized void reclaimMemStoreMemory() { if (this.server.getGlobalMemstoreSize().get() = globalMemStoreLimit) { flushSomeRegions(); } } {code} new: {code} public void reclaimMemStoreMemory() { if (this.server.getGlobalMemstoreSize().get() = globalMemStoreLimit) { flushSomeRegions(); } } private synchronized void flushSomeRegions() { if (this.server.getGlobalMemstoreSize().get() globalMemStoreLimit) { return; // double check the global memstore size inside of the synchronized block. } ... } {code} was: There are two improvements in this jira: 1) Change 2 hotspot synchronized functions into double locking pattern. So it shall remove the synchronization overhead in the normal case. 2) Avoid creating HBaseConfiguraiton object for each HLog. Every time when creating a HBaseConfiguraiton object, it would parse the xml configuration files from disk, which is not cheap operation. Several HBase write perf improvement Key: HBASE-6968 URL: https://issues.apache.org/jira/browse/HBASE-6968 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Here are 2 hbase write performance improvements recently found out. 1) Avoid creating HBaseConfiguraiton object for each HLog. Every time when creating a HBaseConfiguraiton object, it would parse the xml configuration files from disk, which is not cheap operation. In HLog.java: orig: {code:title=HLog.java} newWriter = createWriter(fs, newPath, HBaseConfiguration.create(conf)); {code} new: {code} newWriter = createWriter(fs, newPath, conf); {code} 2) Change 2 hotspot synchronized functions into double locking pattern. So it shall remove the synchronization overhead in the normal case. orig: {code:title=HBaseRpcMetrics.java} public synchronized void inc(String name, int amt) { MetricsTimeVaryingRate m = get(name); if (m == null) { m = create(name); } m.inc(amt); } {code} new: {code} public void inc(String name, int amt) { MetricsTimeVaryingRate m = get(name); if (m == null) { synchronized (this) { if ((m = get(name)) == null) { m = create(name); } } } m.inc(amt); } {code} = orig: {code:title=MemStoreFlusher.java} public synchronized void reclaimMemStoreMemory() { if (this.server.getGlobalMemstoreSize().get() = globalMemStoreLimit) { flushSomeRegions(); } } {code} new: {code} public void reclaimMemStoreMemory() { if (this.server.getGlobalMemstoreSize().get() = globalMemStoreLimit) { flushSomeRegions(); } } private synchronized void flushSomeRegions() { if (this.server.getGlobalMemstoreSize().get() globalMemStoreLimit) { return; // double check the global memstore size inside of the synchronized block. } ... } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6968) Several HBase write perf improvement
[ https://issues.apache.org/jira/browse/HBASE-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-6968: -- Description: Here are 2 hbase write performance improvements recently: 1) Avoid creating HBaseConfiguraiton object for each HLog. Every time when creating a HBaseConfiguraiton object, it would parse the xml configuration files from disk, which is not cheap operation. In HLog.java: orig: {code:title=HLog.java} newWriter = createWriter(fs, newPath, HBaseConfiguration.create(conf)); {code} new: {code} newWriter = createWriter(fs, newPath, conf); {code} 2) Change 2 hotspot synchronized functions into double locking pattern. So it shall remove the synchronization overhead in the normal case. orig: {code:title=HBaseRpcMetrics.java} public synchronized void inc(String name, int amt) { MetricsTimeVaryingRate m = get(name); if (m == null) { m = create(name); } m.inc(amt); } {code} new: {code} public void inc(String name, int amt) { MetricsTimeVaryingRate m = get(name); if (m == null) { synchronized (this) { if ((m = get(name)) == null) { m = create(name); } } } m.inc(amt); } {code} = orig: {code:title=MemStoreFlusher.java} public synchronized void reclaimMemStoreMemory() { if (this.server.getGlobalMemstoreSize().get() = globalMemStoreLimit) { flushSomeRegions(); } } {code} new: {code} public void reclaimMemStoreMemory() { if (this.server.getGlobalMemstoreSize().get() = globalMemStoreLimit) { flushSomeRegions(); } } private synchronized void flushSomeRegions() { if (this.server.getGlobalMemstoreSize().get() globalMemStoreLimit) { return; // double check the global memstore size inside of the synchronized block. } ... } {code} was: Here are 2 hbase write performance improvements recently found out. 1) Avoid creating HBaseConfiguraiton object for each HLog. Every time when creating a HBaseConfiguraiton object, it would parse the xml configuration files from disk, which is not cheap operation. In HLog.java: orig: {code:title=HLog.java} newWriter = createWriter(fs, newPath, HBaseConfiguration.create(conf)); {code} new: {code} newWriter = createWriter(fs, newPath, conf); {code} 2) Change 2 hotspot synchronized functions into double locking pattern. So it shall remove the synchronization overhead in the normal case. orig: {code:title=HBaseRpcMetrics.java} public synchronized void inc(String name, int amt) { MetricsTimeVaryingRate m = get(name); if (m == null) { m = create(name); } m.inc(amt); } {code} new: {code} public void inc(String name, int amt) { MetricsTimeVaryingRate m = get(name); if (m == null) { synchronized (this) { if ((m = get(name)) == null) { m = create(name); } } } m.inc(amt); } {code} = orig: {code:title=MemStoreFlusher.java} public synchronized void reclaimMemStoreMemory() { if (this.server.getGlobalMemstoreSize().get() = globalMemStoreLimit) { flushSomeRegions(); } } {code} new: {code} public void reclaimMemStoreMemory() { if (this.server.getGlobalMemstoreSize().get() = globalMemStoreLimit) { flushSomeRegions(); } } private synchronized void flushSomeRegions() { if (this.server.getGlobalMemstoreSize().get() globalMemStoreLimit) { return; // double check the global memstore size inside of the synchronized block. } ... } {code} Several HBase write perf improvement Key: HBASE-6968 URL: https://issues.apache.org/jira/browse/HBASE-6968 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Here are 2 hbase write performance improvements recently: 1) Avoid creating HBaseConfiguraiton object for each HLog. Every time when creating a HBaseConfiguraiton object, it would parse the xml configuration files from disk, which is not cheap operation. In HLog.java: orig: {code:title=HLog.java} newWriter = createWriter(fs, newPath, HBaseConfiguration.create(conf)); {code} new: {code} newWriter = createWriter(fs, newPath, conf); {code} 2) Change 2 hotspot synchronized functions into double locking pattern. So it shall remove the synchronization overhead in the normal case. orig: {code:title=HBaseRpcMetrics.java} public synchronized void inc(String name, int amt) { MetricsTimeVaryingRate m = get(name); if (m == null) { m = create(name); }
[jira] [Created] (HBASE-6930) [89-fb] Avoid acquiring the same row lock repeatedly
Liyin Tang created HBASE-6930: - Summary: [89-fb] Avoid acquiring the same row lock repeatedly Key: HBASE-6930 URL: https://issues.apache.org/jira/browse/HBASE-6930 Project: HBase Issue Type: Bug Reporter: Liyin Tang When processing the multiPut, multiMutations or multiDelete operations, each IPC handler thread tries to acquire a lock for each row key in these batches. If there are duplicated row keys in these batches, previously the IPC handler thread will repeatedly acquire the same row key again and again. So the optimization is to sort each batch operation based on the row key in the client side, and skip acquiring the same row lock repeatedly in the server side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6911) Set the logging for the location cache hit in the hbase client as trace level
Liyin Tang created HBASE-6911: - Summary: Set the logging for the location cache hit in the hbase client as trace level Key: HBASE-6911 URL: https://issues.apache.org/jira/browse/HBASE-6911 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Priority: Trivial It is too much logging for each row-location cache hit in the hbase client. So set it as the trace level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6911) [89-fb] Set the logging for the location cache hit in the hbase client as trace level
[ https://issues.apache.org/jira/browse/HBASE-6911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-6911: -- Summary: [89-fb] Set the logging for the location cache hit in the hbase client as trace level (was: Set the logging for the location cache hit in the hbase client as trace level) [89-fb] Set the logging for the location cache hit in the hbase client as trace level - Key: HBASE-6911 URL: https://issues.apache.org/jira/browse/HBASE-6911 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Priority: Trivial It is too much logging for each row-location cache hit in the hbase client. So set it as the trace level. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper
Liyin Tang created HBASE-6858: - Summary: Fix the incorrect BADVERSION checking in the recoverable zookeeper Key: HBASE-6858 URL: https://issues.apache.org/jira/browse/HBASE-6858 Project: HBase Issue Type: Bug Reporter: Liyin Tang Thanks for Stack and Kaka's reporting that there is a bug in the recoverable zookeeper when handling BADVERSION exception for setData(). It shall compare the ID payload of the data in zk with its own identifier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper
[ https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-6858: -- Attachment: HBASE-6858.patch Fix the incorrect BADVERSION checking in the recoverable zookeeper -- Key: HBASE-6858 URL: https://issues.apache.org/jira/browse/HBASE-6858 Project: HBase Issue Type: Bug Reporter: Liyin Tang Attachments: HBASE-6858.patch Thanks for Stack and Kaka's reporting that there is a bug in the recoverable zookeeper when handling BADVERSION exception for setData(). It shall compare the ID payload of the data in zk with its own identifier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper
[ https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang reassigned HBASE-6858: - Assignee: Liyin Tang Fix the incorrect BADVERSION checking in the recoverable zookeeper -- Key: HBASE-6858 URL: https://issues.apache.org/jira/browse/HBASE-6858 Project: HBase Issue Type: Bug Reporter: Liyin Tang Assignee: Liyin Tang Attachments: HBASE-6858.patch Thanks for Stack and Kaka's reporting that there is a bug in the recoverable zookeeper when handling BADVERSION exception for setData(). It shall compare the ID payload of the data in zk with its own identifier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper
[ https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-6858: -- Attachment: (was: HBASE-6858.patch) Fix the incorrect BADVERSION checking in the recoverable zookeeper -- Key: HBASE-6858 URL: https://issues.apache.org/jira/browse/HBASE-6858 Project: HBase Issue Type: Bug Reporter: Liyin Tang Assignee: Liyin Tang Attachments: HBASE-6858.patch Thanks for Stack and Kaka's reporting that there is a bug in the recoverable zookeeper when handling BADVERSION exception for setData(). It shall compare the ID payload of the data in zk with its own identifier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper
[ https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-6858: -- Attachment: HBASE-6858.patch Fix the incorrect BADVERSION checking in the recoverable zookeeper -- Key: HBASE-6858 URL: https://issues.apache.org/jira/browse/HBASE-6858 Project: HBase Issue Type: Bug Reporter: Liyin Tang Assignee: Liyin Tang Attachments: HBASE-6858.patch Thanks for Stack and Kaka's reporting that there is a bug in the recoverable zookeeper when handling BADVERSION exception for setData(). It shall compare the ID payload of the data in zk with its own identifier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper
[ https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460855#comment-13460855 ] Liyin Tang commented on HBASE-6858: --- Addressed Jimmy's comments! Thanks Jimmy ! Fix the incorrect BADVERSION checking in the recoverable zookeeper -- Key: HBASE-6858 URL: https://issues.apache.org/jira/browse/HBASE-6858 Project: HBase Issue Type: Bug Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.2, 0.96.0 Attachments: HBASE-6858.patch Thanks for Stack and Kaka's reporting that there is a bug in the recoverable zookeeper when handling BADVERSION exception for setData(). It shall compare the ID payload of the data in zk with its own identifier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper
[ https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460872#comment-13460872 ] Liyin Tang commented on HBASE-6858: --- The code is difference between 89 and trunk. Some variable has been renamed. Let me re-submit the patch ! Fix the incorrect BADVERSION checking in the recoverable zookeeper -- Key: HBASE-6858 URL: https://issues.apache.org/jira/browse/HBASE-6858 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.94.2, 0.96.0 Attachments: HBASE-6858.patch Thanks for Stack and Kaka's reporting that there is a bug in the recoverable zookeeper when handling BADVERSION exception for setData(). It shall compare the ID payload of the data in zk with its own identifier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper
[ https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-6858: -- Attachment: HBASE-6858_v2.patch Fix the incorrect BADVERSION checking in the recoverable zookeeper -- Key: HBASE-6858 URL: https://issues.apache.org/jira/browse/HBASE-6858 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.96.0 Attachments: HBASE-6858.patch, HBASE-6858_v2.patch Thanks for Stack and Kaka's reporting that there is a bug in the recoverable zookeeper when handling BADVERSION exception for setData(). It shall compare the ID payload of the data in zk with its own identifier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper
[ https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460934#comment-13460934 ] Liyin Tang commented on HBASE-6858: --- I agree that this is not a very general solution and may introduce race condition if multiple threads in one zk client try to update the same znode with different version number, then current code will hide the BADVERSION exception. We didn't find this use case in HBase at that time, roughly 1.5 years ago and it is cheaper to compare the identifier than comparing the data payload. I also believe leaving this kind of assumption in the system may introduce or has introduced some uncertainty or bugs and it is definitely worth improving. Fix the incorrect BADVERSION checking in the recoverable zookeeper -- Key: HBASE-6858 URL: https://issues.apache.org/jira/browse/HBASE-6858 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.96.0 Attachments: HBASE-6858.patch, HBASE-6858_v2.patch Thanks for Stack and Kaka's reporting that there is a bug in the recoverable zookeeper when handling BADVERSION exception for setData(). It shall compare the ID payload of the data in zk with its own identifier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper
[ https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460938#comment-13460938 ] Liyin Tang commented on HBASE-6858: --- @stack, the latest Hudson doesn't look like building with the latest patch (HBASE-6858_v2.patch). Fix the incorrect BADVERSION checking in the recoverable zookeeper -- Key: HBASE-6858 URL: https://issues.apache.org/jira/browse/HBASE-6858 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.96.0 Attachments: HBASE-6858.patch, HBASE-6858_v2.patch Thanks for Stack and Kaka's reporting that there is a bug in the recoverable zookeeper when handling BADVERSION exception for setData(). It shall compare the ID payload of the data in zk with its own identifier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper
[ https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460947#comment-13460947 ] Liyin Tang commented on HBASE-6858: --- The recoverable zk tried to recover from the connection less exception gracefully at beginning and I still believe this problem shall be solved by the zookeeper client library instead of the application such as HBase. The third option shall be check whether the latest zookeeper has supported to be recovered by the connection loss exception gracefully. In that case, we just need to totally remove the recoverable zk ! Fix the incorrect BADVERSION checking in the recoverable zookeeper -- Key: HBASE-6858 URL: https://issues.apache.org/jira/browse/HBASE-6858 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.96.0 Attachments: HBASE-6858.patch, HBASE-6858_v2.patch Thanks for Stack and Kaka's reporting that there is a bug in the recoverable zookeeper when handling BADVERSION exception for setData(). It shall compare the ID payload of the data in zk with its own identifier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6858) Fix the incorrect BADVERSION checking in the recoverable zookeeper
[ https://issues.apache.org/jira/browse/HBASE-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang updated HBASE-6858: -- Attachment: HBASE-6858_v3.patch Compare the entire data (identifier + data payload) together as discussed. In addition, we may need to append the thread id into the identifier. Fix the incorrect BADVERSION checking in the recoverable zookeeper -- Key: HBASE-6858 URL: https://issues.apache.org/jira/browse/HBASE-6858 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Liyin Tang Assignee: Liyin Tang Fix For: 0.96.0 Attachments: HBASE-6858.patch, HBASE-6858_v2.patch, HBASE-6858_v3.patch Thanks for Stack and Kaka's reporting that there is a bug in the recoverable zookeeper when handling BADVERSION exception for setData(). It shall compare the ID payload of the data in zk with its own identifier. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6673) Clear up the invalid ResultScanner in the ThriftServerRunner
Liyin Tang created HBASE-6673: - Summary: Clear up the invalid ResultScanner in the ThriftServerRunner Key: HBASE-6673 URL: https://issues.apache.org/jira/browse/HBASE-6673 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Clear up the invalid ResultScanner in the ThriftServerRunner -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6556) Avoid ssh to localhost in startup scripts
[ https://issues.apache.org/jira/browse/HBASE-6556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang resolved HBASE-6556. --- Resolution: Duplicate Avoid ssh to localhost in startup scripts - Key: HBASE-6556 URL: https://issues.apache.org/jira/browse/HBASE-6556 Project: HBase Issue Type: Improvement Components: scripts Environment: Mac OSX Mountain Lion, HBase 89-fb Reporter: Ramkumar Vadali Priority: Trivial The use of ssh in scripts like zookeepers.sh and regionservers.sh for a single node setup is not necessary. We can execute the command directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6555) Avoid ssh to localhost in startup scripts
[ https://issues.apache.org/jira/browse/HBASE-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13438216#comment-13438216 ] Liyin Tang commented on HBASE-6555: --- Hi Ramkumar, I have resolved the 6556,6557 and 6558 as duplicated jiras. Avoid ssh to localhost in startup scripts - Key: HBASE-6555 URL: https://issues.apache.org/jira/browse/HBASE-6555 Project: HBase Issue Type: Improvement Components: scripts Environment: Mac OSX Mountain Lion, HBase 89-fb Reporter: Ramkumar Vadali Priority: Trivial The use of ssh in scripts like zookeepers.sh and regionservers.sh for a single node setup is not necessary. We can execute the command directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6558) Avoid ssh to localhost in single node setup.
[ https://issues.apache.org/jira/browse/HBASE-6558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang resolved HBASE-6558. --- Resolution: Duplicate Avoid ssh to localhost in single node setup. - Key: HBASE-6558 URL: https://issues.apache.org/jira/browse/HBASE-6558 Project: HBase Issue Type: Improvement Components: scripts Environment: mac osx mountain lion, hbase 89-fb Reporter: Ramkumar Vadali Priority: Trivial Original Estimate: 24h Remaining Estimate: 24h The use of ssh in scripts like zookeepers.sh and regionservers.sh for a single node setup is not necessary. We can execute the command directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-6557) Avoid ssh to localhost in single node setup.
[ https://issues.apache.org/jira/browse/HBASE-6557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang resolved HBASE-6557. --- Resolution: Duplicate Avoid ssh to localhost in single node setup. - Key: HBASE-6557 URL: https://issues.apache.org/jira/browse/HBASE-6557 Project: HBase Issue Type: Improvement Components: scripts Environment: mac osx mountain lion, hbase 89-fb Reporter: Ramkumar Vadali Priority: Trivial The use of ssh in scripts like zookeepers.sh and regionservers.sh for a single node setup is not necessary. We can execute the command directly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6361) Change the compaction queue to a round robin scheduler
[ https://issues.apache.org/jira/browse/HBASE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Tang reassigned HBASE-6361: - Assignee: Akashnil Change the compaction queue to a round robin scheduler -- Key: HBASE-6361 URL: https://issues.apache.org/jira/browse/HBASE-6361 Project: HBase Issue Type: Improvement Reporter: Akashnil Assignee: Akashnil Currently the compaction requests are submitted to the minor/major compaction queue of a region-server from every column-family/region belonging to it. The requests are processed from the queue in FIFO order (First in First out). We want to make a lazy scheduler in place of the current queue-based one. The idea of lazy scheduling is that, it is always better to make a decision (compaction selection) later if the decision is relevant later only. Presently, when the queue gets bottle-necked, there is a delay between compaction selection of a request and its execution. Rather than that, we can postpone the compaction selection until the queue is empty when we will have more information and choices (new flush files will have arrived by then) to make a better decision. Removing the queue, we propose to implement a round-robin scheduler. All the column families in their regions will be visited in sequence periodically. In each visit, if the column family generates a valid compaction request, the request is executed before moving to the next one. We do not plan to change the current compaction algorithm for now. We expect that it will automatically make a better decision when doing just-in-time selection due to the new change. How do we know that? Let us consider an example. Suppose there is a short term bottleneck in the queue so that it is blocked for a period of time. (Let the min-files for compaction = 4). For an active column-family, when new flushes are written, new compaction requests, each of size 4, will be added to the queue continuously until the queue starts processing them. Now consider a round-robin scheduler. The effect of a bottle-neck due to the IO rate of compaction results in a longer latency to visit the same column family again. When the same active column family is visited following a long delay, suppose 16 new flush files have been written there. The compaction selection algorithm will select one compaction request of size 16, as opposed to 4 compaction requests of size 4 that would have been generated in the previous case. A compaction request with 16 flush files is more IOPs-efficient than the same set of files being compacted 4 at a time. This is because both consume the same total amount of reads and writes while producing a file of size 16 compared to 4 files of size 4. So, in the second case, we obtained a free compaction 4*4-16 without paying for it. In case of the queue, those smaller 4 sized files would have consumed more IOPs to become bigger later. On my simulator, I did some experiments on how a bottle-neck of the queue affects the compaction selections in the current system. It appears that, a filled up queue actually makes all future compaction selections less and less efficient in terms of IOPs, resulting in a runway positive feedback loop which can potentially explode the compaction queue. (This was also observed in production recently). The main effect of this change should be to deal with bursty loads. When a bottleneck occurs, the compaction selection will become more IOPs-efficient rather than less efficient, resulting in negative feedback and restoration to stability more easily. As for monitoring, the compaction queue size will not be present as a metric. However, the number of files in each compaction will indicate if a bottleneck has occurred. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira