[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121745#comment-13121745 ] dhruba borthakur commented on HBASE-4528: - Hi Ted, if the HLog.sync() throws an exception, then it is not clear whether the transaction made it to the disk or not, but the changes to memstore are already made. So, it won't help to catch the exception from Hlog.sync() and continue; instead it is better to fail completely, do you agree? Kannan: thanks for pointing out the problem related to memstore flushes. I had a solution for this deployed internally, will upload a new patch that will include the fix to the problem. Essentially, the flush will wait for all currently running transactions to quiesce before committing the flush. The put operation can release the rowlock before sync-ing the Hlog -- Key: HBASE-4528 URL: https://issues.apache.org/jira/browse/HBASE-4528 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: appendNoSyncPut1.txt, appendNoSyncPut2.txt This allows for better throughput when there are hot rows. A single row update improves from 100 puts/sec/server to 5000 puts/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4545) TestHLog doesn't clean up after itself
[ https://issues.apache.org/jira/browse/HBASE-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Helmling resolved HBASE-4545. -- Resolution: Fixed Fix Version/s: 0.94.0 Assignee: Gary Helmling Hadoop Flags: Reviewed Committed to trunk. TestHLog doesn't clean up after itself -- Key: HBASE-4545 URL: https://issues.apache.org/jira/browse/HBASE-4545 Project: HBase Issue Type: Test Components: test Reporter: Gary Helmling Assignee: Gary Helmling Fix For: 0.94.0 Attachments: HBASE-4545.patch TestHLog has been hanging during shutdown of the mini cluster after all tests are run. Further investigation shows that there are many places where the TestHLog tests are not cleaning up after themselves. Necessary changes are: * since all tests use HLog directly, a MiniHBaseCluster is not needed. The test should only launch a MiniDFSCluster * several tests do not close the created HLog at completion * the test class should shutdown the mini cluster in an @AfterClass method -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121752#comment-13121752 ] Matt Corgan commented on HBASE-4218: Jacek - have you done anything with the KeyValue/scanner/searching interfaces? I'm curious to see your approach. Like you, I'm materializing a the iterator's current cell, but the materialized row/family/qualifier/timestamp/type/value all reside in separate arrays/fields. The scanner can only materialize one cell at a time, which i think can work long term but doesn't play well with some of the current scanner interfaces. The problem can be dodged by spawning a new array and copying everything into the KeyValue format, but we would see a massive speedup and could possibly eliminate all object instantiation (and furious garbage collection) if we could do comparisons on the intermediate arrays. I've mocked up some cell interfaces and comparators but am wondering what you've already got in progress. Regarding scanners - Supported operations on a block are next(), previous(), nextRow(), previousRow(), positionAt(KeyValue kv, boolean beforeIfMiss), and some others. Main problem is that i can't peek() which is used in the current version of the KeyValue heap, though i've mocked an alternate approach without it. I'm also starting to think that a traditional iterator's hasNext() method should not be supported so that true streaming can be done and so that blocks don't need to know about their neighbors. Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Reporter: Jacek Migdal Labels: compression A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4482) Race Condition Concerning Eviction in SlabCache
[ https://issues.apache.org/jira/browse/HBASE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Pi updated HBASE-4482: - Attachment: hbase-4482v4.2.txt final version. Race Condition Concerning Eviction in SlabCache --- Key: HBASE-4482 URL: https://issues.apache.org/jira/browse/HBASE-4482 Project: HBase Issue Type: Sub-task Reporter: Li Pi Assignee: Li Pi Priority: Blocker Fix For: 0.92.0 Attachments: hbase-4482v1.txt, hbase-4482v2.txt, hbase-4482v4.2.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows
[ https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121762#comment-13121762 ] Lars Hofhansl commented on HBASE-4536: -- That might be getting hard to understand. minVersions would have slightly different meaning depending on whether that extra flag is set. Without the flag minVersions is like maxVersions for deleted rows, not sure who would need that. Having just the flag for deleted rows would also make the code easier to follow as the column tracker would no longer need to distinguish between normal rows, delete markers, and deleted rows as it does in the current patch; but only between rows (deleted or not) and delete markers. Allow CF to retain deleted rows --- Key: HBASE-4536 URL: https://issues.apache.org/jira/browse/HBASE-4536 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0, 0.94.0 Parent allows for a cluster to retain rows for a TTL or keep a minimum number of versions. However, if a client deletes a row all version older than the delete tomb stone will be remove at the next major compaction (and even at memstore flush - see HBASE-4241). There should be a way to retain those version to guard against software error. I see two options here: 1. Add a new flag HColumnDescriptor. Something like RETAIN_DELETED. 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of versions even past the delete marker. #1 would allow for more flexibility. #2 comes somewhat naturally with parent (from a user viewpoint) Comments? Any other options? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HBASE-4528: Attachment: appendNoSyncPut3.txt 1. The flush of memstore waits for current transactions to quiesce before committing the flushed files. This should address the problem pointed out by Kannan. 2. The Hlog.syncer() does not throw an exception, instead causes the regionserver to exit if it is unable to sync to hdfs. The assumption here is that if hbase is unable to write/sync to hdfs, then the simplest and correct error recovery is to exit. (For example, if the memstore flush fails, the regionserver exits) The put operation can release the rowlock before sync-ing the Hlog -- Key: HBASE-4528 URL: https://issues.apache.org/jira/browse/HBASE-4528 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: appendNoSyncPut1.txt, appendNoSyncPut2.txt, appendNoSyncPut3.txt This allows for better throughput when there are hot rows. A single row update improves from 100 puts/sec/server to 5000 puts/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121789#comment-13121789 ] jirapos...@reviews.apache.org commented on HBASE-4528: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2141/ --- (Updated 2011-10-06 08:08:49.288861) Review request for hbase. Changes --- 1. The flush of memstore waits for current transactions to quiesce before committing the flushed files. This should address the problem pointed out by Kannan. 2. The Hlog.syncer() does not throw an exception, instead causes the regionserver to exit if it is unable to sync to hdfs. The assumption here is that if hbase is unable to write/sync to hdfs, then the simplest and correct error recovery is to exit. (For example, if the memstore flush fails, the regionserver exits) Summary --- The changes the multiPut operation so that the sync to the wal occurs outside the rowlock. This enhancement is done only to HRegion.mut(Put[]) because this is the only method that gets invoked from an application. The HRegion.put(Put) is used only by unit tests and should possibly be deprecated. I have attached a unit test. I have not yet run all unit tests, but early feedback on this patch will be very helpful. This addresses bug HBASE-4528. https://issues.apache.org/jira/browse/HBASE-4528 Diffs (updated) - /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1179529 /src/main/java/org/apache/hadoop/hbase/regionserver/ReadWriteConsistencyControl.java 1179529 /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1179529 /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlusher.java 1179529 /src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 1179529 /src/test/java/org/apache/hadoop/hbase/regionserver/TestParallelPut.java PRE-CREATION /src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1179529 Diff: https://reviews.apache.org/r/2141/diff Testing --- Not yet run the full suite of unit tests. Thanks, Dhruba The put operation can release the rowlock before sync-ing the Hlog -- Key: HBASE-4528 URL: https://issues.apache.org/jira/browse/HBASE-4528 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: appendNoSyncPut1.txt, appendNoSyncPut2.txt, appendNoSyncPut3.txt This allows for better throughput when there are hot rows. A single row update improves from 100 puts/sec/server to 5000 puts/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.
[ https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Sell updated HBASE-1744: Attachment: HBASE-1744.4.patch Added patch with Bob Copeland's fixes and updated to trunk Thrift server to match the new java api. Key: HBASE-1744 URL: https://issues.apache.org/jira/browse/HBASE-1744 Project: HBase Issue Type: Improvement Components: thrift Reporter: Tim Sell Assignee: Lars Francke Priority: Critical Fix For: 0.94.0 Attachments: HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch This mutateRows, etc.. is a little confusing compared to the new cleaner java client. Thinking of ways to make a thrift client that is just as elegant. something like: void put(1:Bytes table, 2:TPut put) throws (1:IOError io) with: struct TColumn { 1:Bytes family, 2:Bytes qualifier, 3:i64 timestamp } struct TPut { 1:Bytes row, 2:mapTColumn, Bytes values } This creates more verbose rpc than if the columns in TPut were just mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and still be intuitive from say python. Presumably the goal of a thrift gateway is to be easy first. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.
[ https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Sell updated HBASE-1744: Assignee: Tim Sell (was: Lars Francke) Status: Patch Available (was: Open) Thrift server to match the new java api. Key: HBASE-1744 URL: https://issues.apache.org/jira/browse/HBASE-1744 Project: HBase Issue Type: Improvement Components: thrift Reporter: Tim Sell Assignee: Tim Sell Priority: Critical Fix For: 0.94.0 Attachments: HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch This mutateRows, etc.. is a little confusing compared to the new cleaner java client. Thinking of ways to make a thrift client that is just as elegant. something like: void put(1:Bytes table, 2:TPut put) throws (1:IOError io) with: struct TColumn { 1:Bytes family, 2:Bytes qualifier, 3:i64 timestamp } struct TPut { 1:Bytes row, 2:mapTColumn, Bytes values } This creates more verbose rpc than if the columns in TPut were just mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and still be intuitive from say python. Presumably the goal of a thrift gateway is to be easy first. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.
[ https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121851#comment-13121851 ] Tim Sell commented on HBASE-1744: - Thanks Bob, my patch still doesn't have tests, looking at that now. Thrift server to match the new java api. Key: HBASE-1744 URL: https://issues.apache.org/jira/browse/HBASE-1744 Project: HBase Issue Type: Improvement Components: thrift Reporter: Tim Sell Assignee: Tim Sell Priority: Critical Fix For: 0.94.0 Attachments: HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch This mutateRows, etc.. is a little confusing compared to the new cleaner java client. Thinking of ways to make a thrift client that is just as elegant. something like: void put(1:Bytes table, 2:TPut put) throws (1:IOError io) with: struct TColumn { 1:Bytes family, 2:Bytes qualifier, 3:i64 timestamp } struct TPut { 1:Bytes row, 2:mapTColumn, Bytes values } This creates more verbose rpc than if the columns in TPut were just mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and still be intuitive from say python. Presumably the goal of a thrift gateway is to be easy first. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121968#comment-13121968 ] jirapos...@reviews.apache.org commented on HBASE-4528: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2141/#review2390 --- /src/main/java/org/apache/hadoop/hbase/regionserver/ReadWriteConsistencyControl.java https://reviews.apache.org/r/2141/#comment5491 If advanceMemstore() returns true above, can we skip this call ? - Ted On 2011-10-06 08:08:49, Dhruba Borthakur wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2141/ bq. --- bq. bq. (Updated 2011-10-06 08:08:49) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. The changes the multiPut operation so that the sync to the wal occurs outside the rowlock. bq. bq. This enhancement is done only to HRegion.mut(Put[]) because this is the only method that gets invoked from an application. The HRegion.put(Put) is used only by unit tests and should possibly be deprecated. bq. bq. I have attached a unit test. I have not yet run all unit tests, but early feedback on this patch will be very helpful. bq. bq. bq. This addresses bug HBASE-4528. bq. https://issues.apache.org/jira/browse/HBASE-4528 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1179529 bq. /src/main/java/org/apache/hadoop/hbase/regionserver/ReadWriteConsistencyControl.java 1179529 bq./src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1179529 bq./src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlusher.java 1179529 bq./src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 1179529 bq./src/test/java/org/apache/hadoop/hbase/regionserver/TestParallelPut.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1179529 bq. bq. Diff: https://reviews.apache.org/r/2141/diff bq. bq. bq. Testing bq. --- bq. bq. Not yet run the full suite of unit tests. bq. bq. bq. Thanks, bq. bq. Dhruba bq. bq. The put operation can release the rowlock before sync-ing the Hlog -- Key: HBASE-4528 URL: https://issues.apache.org/jira/browse/HBASE-4528 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: appendNoSyncPut1.txt, appendNoSyncPut2.txt, appendNoSyncPut3.txt This allows for better throughput when there are hot rows. A single row update improves from 100 puts/sec/server to 5000 puts/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4482) Race Condition Concerning Eviction in SlabCache
[ https://issues.apache.org/jira/browse/HBASE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122034#comment-13122034 ] Jonathan Gray commented on HBASE-4482: -- +1 on keeping this in 0.92 regardless of stability and marking as experimental. Race Condition Concerning Eviction in SlabCache --- Key: HBASE-4482 URL: https://issues.apache.org/jira/browse/HBASE-4482 Project: HBase Issue Type: Sub-task Reporter: Li Pi Assignee: Li Pi Priority: Blocker Fix For: 0.92.0 Attachments: hbase-4482v1.txt, hbase-4482v2.txt, hbase-4482v4.2.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.
[ https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122048#comment-13122048 ] Ted Yu commented on HBASE-1744: --- I applied patch v4 and got the following: http://pastebin.com/rmpXff7m Thrift server to match the new java api. Key: HBASE-1744 URL: https://issues.apache.org/jira/browse/HBASE-1744 Project: HBase Issue Type: Improvement Components: thrift Reporter: Tim Sell Assignee: Tim Sell Priority: Critical Fix For: 0.94.0 Attachments: HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch This mutateRows, etc.. is a little confusing compared to the new cleaner java client. Thinking of ways to make a thrift client that is just as elegant. something like: void put(1:Bytes table, 2:TPut put) throws (1:IOError io) with: struct TColumn { 1:Bytes family, 2:Bytes qualifier, 3:i64 timestamp } struct TPut { 1:Bytes row, 2:mapTColumn, Bytes values } This creates more verbose rpc than if the columns in TPut were just mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and still be intuitive from say python. Presumably the goal of a thrift gateway is to be easy first. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.
[ https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Sell updated HBASE-1744: Status: Open (was: Patch Available) Weird. Strange it works for me, turns out I am compiling with thrift 0.6.1 and hbase is using 0.7, cancelling the patch. I'll stick a new one up in a bit. Thrift server to match the new java api. Key: HBASE-1744 URL: https://issues.apache.org/jira/browse/HBASE-1744 Project: HBase Issue Type: Improvement Components: thrift Reporter: Tim Sell Assignee: Tim Sell Priority: Critical Fix For: 0.94.0 Attachments: HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch This mutateRows, etc.. is a little confusing compared to the new cleaner java client. Thinking of ways to make a thrift client that is just as elegant. something like: void put(1:Bytes table, 2:TPut put) throws (1:IOError io) with: struct TColumn { 1:Bytes family, 2:Bytes qualifier, 3:i64 timestamp } struct TPut { 1:Bytes row, 2:mapTColumn, Bytes values } This creates more verbose rpc than if the columns in TPut were just mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and still be intuitive from say python. Presumably the goal of a thrift gateway is to be easy first. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.
[ https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Sell updated HBASE-1744: Attachment: HBASE-1744.5.patch Added patch which has the thrift2 generated files generated from 0.7.0, Also has a few tests. Thrift server to match the new java api. Key: HBASE-1744 URL: https://issues.apache.org/jira/browse/HBASE-1744 Project: HBase Issue Type: Improvement Components: thrift Reporter: Tim Sell Assignee: Tim Sell Priority: Critical Fix For: 0.94.0 Attachments: HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch This mutateRows, etc.. is a little confusing compared to the new cleaner java client. Thinking of ways to make a thrift client that is just as elegant. something like: void put(1:Bytes table, 2:TPut put) throws (1:IOError io) with: struct TColumn { 1:Bytes family, 2:Bytes qualifier, 3:i64 timestamp } struct TPut { 1:Bytes row, 2:mapTColumn, Bytes values } This creates more verbose rpc than if the columns in TPut were just mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and still be intuitive from say python. Presumably the goal of a thrift gateway is to be easy first. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter
[ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122075#comment-13122075 ] jirapos...@reviews.apache.org commented on HBASE-4469: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2235/ --- Review request for hbase. Summary --- The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row. Also, Update the TestBlocksRead unit tests. since most of block read count has dropped to a lower number. Evaluation: In TestSeekingOptimization, it saved 31.6% seek operation perviously. Now it saves about 41.82% seek operation. 10% more seek operation. == Before this diff: For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% = Apply this diff: For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% = Thanks Mikhail and Kannan's help and discussion. This addresses bug HBASE-4469. https://issues.apache.org/jira/browse/HBASE-4469 Diffs - src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 7b0b9e6 src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 8dd8a68 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java abccea4 Diff: https://reviews.apache.org/r/2235/diff Testing --- Run all the unit tests. There are 2 unit tests failed with and without my change. TestDistributedLogSplitting TestHTablePool Thanks, Liyin Avoid top row seek by looking up bloomfilter Key: HBASE-4469 URL: https://issues.apache.org/jira/browse/HBASE-4469 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.
[ https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122083#comment-13122083 ] Ted Yu commented on HBASE-1744: --- TestThriftHBaseServiceHandler passed for patch v5. Can we change the wording for: {code} + echo thrift2 run the new HBase Thrift server {code} I believe there would be newer Thrift server down the road :-) Also, experience using thrift2 would be helpful for other users. Thrift server to match the new java api. Key: HBASE-1744 URL: https://issues.apache.org/jira/browse/HBASE-1744 Project: HBase Issue Type: Improvement Components: thrift Reporter: Tim Sell Assignee: Tim Sell Priority: Critical Fix For: 0.94.0 Attachments: HBASE-1744.2.patch, HBASE-1744.3.patch, HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch This mutateRows, etc.. is a little confusing compared to the new cleaner java client. Thinking of ways to make a thrift client that is just as elegant. something like: void put(1:Bytes table, 2:TPut put) throws (1:IOError io) with: struct TColumn { 1:Bytes family, 2:Bytes qualifier, 3:i64 timestamp } struct TPut { 1:Bytes row, 2:mapTColumn, Bytes values } This creates more verbose rpc than if the columns in TPut were just mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and still be intuitive from say python. Presumably the goal of a thrift gateway is to be easy first. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter
[ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122088#comment-13122088 ] Ted Yu commented on HBASE-4469: --- I don't see TestBlocksRead in the latest review. Avoid top row seek by looking up bloomfilter Key: HBASE-4469 URL: https://issues.apache.org/jira/browse/HBASE-4469 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing
[ https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122106#comment-13122106 ] jirapos...@reviews.apache.org commented on HBASE-4540: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2251/ --- Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. Summary --- Fix for handling HBASE-4539 and HBASE-4540. Ran all the testcases. Added one new testcase to verify OpenedRegionHandler scenarios. Also addresses Ted's comments. This addresses bug HBASE-4540. https://issues.apache.org/jira/browse/HBASE-4540 Diffs - http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1179238 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java 1179238 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1179238 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 1179238 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestOpenedRegionHandler.java PRE-CREATION Diff: https://reviews.apache.org/r/2251/diff Testing --- Yes Thanks, ramkrishna OpenedRegionHandler is not enforcing atomicity of the operation it is performing Key: HBASE-4540 URL: https://issues.apache.org/jira/browse/HBASE-4540 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4540_1.patch - OpenedRegionHandler has not yet deleted the znode of the region R1 opened by RS1. - RS1 goes down. - Servershutdownhandler assigns the region R1 to RS2. - The znode of R1 is moved to OFFLINE state by master or OPENING state by RS2 if RS2 has started opening the region. - Now the first OpenedRegionHandler tries to delete the znode thinking its in OPENED state but fails. - Though it fails it removes the node from RIT and adds RS1 as the owner of R1 in master's memory. - Now when RS2 completes opening the region the master is not able to open the region as already the reigon has been deleted from RIT. {code} Master == 2011-10-05 20:49:45,301 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of linux146,60020,1317827727647 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because 1 region(s) in transition: {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9. state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847} 2011-10-05 20:49:57,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Deleting existing unassigned node for 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Attempting to delete unassigned node 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state but node is in RS_ZK_REGION_OPENING state After the region is opened in RS2 = 2011-10-05 20:50:48,066 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9, which is more than 15 seconds late 2011-10-05 20:50:48,290 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null and not in expected PENDING_OPEN or OPENING states 2011-10-05 20:50:53,743 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=linux76,60020,1317827746847, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:54,182 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s) and gc'd 0 unreferenced parent region(s) 2011-10-05 20:50:54,397 WARN org.apache.hadoop.hbase.master.AssignmentManager: Received OPENING for region 3e69d628a8bd8e9b7c5e7a2a6e03aad9 from server linux76,60020,1317827746847 but region was in the state null
[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing
[ https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122123#comment-13122123 ] jirapos...@reviews.apache.org commented on HBASE-4540: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2251/#review2395 --- http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestOpenedRegionHandler.java https://reviews.apache.org/r/2251/#comment5498 The two tests share a lot of the same code, some refactoring would be good http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestOpenedRegionHandler.java https://reviews.apache.org/r/2251/#comment5497 You should be resetting the conf to what was created inside TEST_UTIL. - Jean-Daniel On 2011-10-06 17:55:05, ramkrishna vasudevan wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2251/ bq. --- bq. bq. (Updated 2011-10-06 17:55:05) bq. bq. bq. Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. Fix for handling HBASE-4539 and HBASE-4540. bq. Ran all the testcases. Added one new testcase to verify OpenedRegionHandler scenarios. bq. Also addresses Ted's comments. bq. bq. bq. This addresses bug HBASE-4540. bq. https://issues.apache.org/jira/browse/HBASE-4540 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1179238 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java 1179238 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1179238 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 1179238 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestOpenedRegionHandler.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/2251/diff bq. bq. bq. Testing bq. --- bq. bq. Yes bq. bq. bq. Thanks, bq. bq. ramkrishna bq. bq. OpenedRegionHandler is not enforcing atomicity of the operation it is performing Key: HBASE-4540 URL: https://issues.apache.org/jira/browse/HBASE-4540 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4540_1.patch - OpenedRegionHandler has not yet deleted the znode of the region R1 opened by RS1. - RS1 goes down. - Servershutdownhandler assigns the region R1 to RS2. - The znode of R1 is moved to OFFLINE state by master or OPENING state by RS2 if RS2 has started opening the region. - Now the first OpenedRegionHandler tries to delete the znode thinking its in OPENED state but fails. - Though it fails it removes the node from RIT and adds RS1 as the owner of R1 in master's memory. - Now when RS2 completes opening the region the master is not able to open the region as already the reigon has been deleted from RIT. {code} Master == 2011-10-05 20:49:45,301 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of linux146,60020,1317827727647 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because 1 region(s) in transition: {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9. state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847} 2011-10-05 20:49:57,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Deleting existing unassigned node for 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Attempting to delete unassigned node 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state but node is in RS_ZK_REGION_OPENING state After the region is opened in RS2 = 2011-10-05 20:50:48,066 DEBUG
[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing
[ https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122147#comment-13122147 ] jirapos...@reviews.apache.org commented on HBASE-4540: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2251/#review2399 --- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java https://reviews.apache.org/r/2251/#comment5504 Can this debugLog be folded into the one above ? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java https://reviews.apache.org/r/2251/#comment5505 Remove this extra line. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java https://reviews.apache.org/r/2251/#comment5506 'for transition ZK node' seems redundant. - Ted On 2011-10-06 17:55:05, ramkrishna vasudevan wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2251/ bq. --- bq. bq. (Updated 2011-10-06 17:55:05) bq. bq. bq. Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. Fix for handling HBASE-4539 and HBASE-4540. bq. Ran all the testcases. Added one new testcase to verify OpenedRegionHandler scenarios. bq. Also addresses Ted's comments. bq. bq. bq. This addresses bug HBASE-4540. bq. https://issues.apache.org/jira/browse/HBASE-4540 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1179238 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java 1179238 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1179238 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 1179238 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestOpenedRegionHandler.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/2251/diff bq. bq. bq. Testing bq. --- bq. bq. Yes bq. bq. bq. Thanks, bq. bq. ramkrishna bq. bq. OpenedRegionHandler is not enforcing atomicity of the operation it is performing Key: HBASE-4540 URL: https://issues.apache.org/jira/browse/HBASE-4540 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4540_1.patch - OpenedRegionHandler has not yet deleted the znode of the region R1 opened by RS1. - RS1 goes down. - Servershutdownhandler assigns the region R1 to RS2. - The znode of R1 is moved to OFFLINE state by master or OPENING state by RS2 if RS2 has started opening the region. - Now the first OpenedRegionHandler tries to delete the znode thinking its in OPENED state but fails. - Though it fails it removes the node from RIT and adds RS1 as the owner of R1 in master's memory. - Now when RS2 completes opening the region the master is not able to open the region as already the reigon has been deleted from RIT. {code} Master == 2011-10-05 20:49:45,301 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of linux146,60020,1317827727647 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because 1 region(s) in transition: {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9. state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847} 2011-10-05 20:49:57,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Deleting existing unassigned node for 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Attempting to delete unassigned node 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state
[jira] [Commented] (HBASE-4070) [Coprocessors] Improve region server metrics to report loaded coprocessors to master
[ https://issues.apache.org/jira/browse/HBASE-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122150#comment-13122150 ] jirapos...@reviews.apache.org commented on HBASE-4070: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2029/#review2398 --- Looks good to me. Ship it after some minor fixes. src/main/java/org/apache/hadoop/hbase/ClusterStatus.java https://reviews.apache.org/r/2029/#comment5502 masterCoprocessors is a string array. I don't think you can use equals() here. If the 2 arrays are sorted, you may use Arrays.equals(). src/main/java/org/apache/hadoop/hbase/HServerLoad.java https://reviews.apache.org/r/2029/#comment5507 This is cool. Array.toString() can return null. You want to check? src/main/java/org/apache/hadoop/hbase/HServerLoad.java https://reviews.apache.org/r/2029/#comment5509 As above: Arrays.toString() can be null. src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java https://reviews.apache.org/r/2029/#comment5508 can you finish the TODO? - Mingjie On 2011-10-05 21:45:30, Eugene Koontz wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2029/ bq. --- bq. bq. (Updated 2011-10-05 21:45:30) bq. bq. bq. Review request for hbase and Mingjie Lai. bq. bq. bq. Summary bq. --- bq. bq. Proposed fix for HBASE-4070. bq. bq. bq. This addresses bug HBASE-4070. bq. https://issues.apache.org/jira/browse/HBASE-4070 bq. bq. bq. Diffs bq. - bq. bq.src/main/jamon/org/apache/hbase/tmpl/master/MasterStatusTmpl.jamon abeb850 bq.src/main/jamon/org/apache/hbase/tmpl/regionserver/RSStatusTmpl.jamon be6fceb bq.src/main/java/org/apache/hadoop/hbase/ClusterStatus.java 01bc1dd bq.src/main/java/org/apache/hadoop/hbase/HServerLoad.java 0c680e4 bq.src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 bq.src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java dbae4fd bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java f80d232 bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 3840279 bq.src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java eda5a9b bq. bq. Diff: https://reviews.apache.org/r/2029/diff bq. bq. bq. Testing bq. --- bq. bq. Two new tests : testRegionServerCoprocessorReported() and testMasterServerCoprocessorsReported() included in a new source file src/test/java/o.a.h.h/coprocessor/TestCoprocessorReporting.java. bq. bq. bq. Thanks, bq. bq. Eugene bq. bq. [Coprocessors] Improve region server metrics to report loaded coprocessors to master Key: HBASE-4070 URL: https://issues.apache.org/jira/browse/HBASE-4070 Project: HBase Issue Type: Improvement Affects Versions: 0.90.3 Reporter: Mingjie Lai Assignee: Eugene Koontz Attachments: HBASE-4070.patch, HBASE-4070.patch, HBASE-4070.patch, master-web-ui.jpg, rs-status-web-ui.jpg HBASE-3512 is about listing loaded cp classes at shell. To make it more generic, we need a way to report this piece of information from region to master (or just at region server level). So later on, we can display the loaded class names at shell as well as web console. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing
[ https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122156#comment-13122156 ] jirapos...@reviews.apache.org commented on HBASE-4540: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2251/#review2400 --- http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java https://reviews.apache.org/r/2251/#comment5510 This new method is similar to deleteNode() above. Maybe we should retrofit the existing deleteNode() by adding expectedVersion ? We can designate some negative constant to signify that version check should be skipped. - Ted On 2011-10-06 17:55:05, ramkrishna vasudevan wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2251/ bq. --- bq. bq. (Updated 2011-10-06 17:55:05) bq. bq. bq. Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray. bq. bq. bq. Summary bq. --- bq. bq. Fix for handling HBASE-4539 and HBASE-4540. bq. Ran all the testcases. Added one new testcase to verify OpenedRegionHandler scenarios. bq. Also addresses Ted's comments. bq. bq. bq. This addresses bug HBASE-4540. bq. https://issues.apache.org/jira/browse/HBASE-4540 bq. bq. bq. Diffs bq. - bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1179238 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java 1179238 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1179238 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 1179238 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestOpenedRegionHandler.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/2251/diff bq. bq. bq. Testing bq. --- bq. bq. Yes bq. bq. bq. Thanks, bq. bq. ramkrishna bq. bq. OpenedRegionHandler is not enforcing atomicity of the operation it is performing Key: HBASE-4540 URL: https://issues.apache.org/jira/browse/HBASE-4540 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4540_1.patch - OpenedRegionHandler has not yet deleted the znode of the region R1 opened by RS1. - RS1 goes down. - Servershutdownhandler assigns the region R1 to RS2. - The znode of R1 is moved to OFFLINE state by master or OPENING state by RS2 if RS2 has started opening the region. - Now the first OpenedRegionHandler tries to delete the znode thinking its in OPENED state but fails. - Though it fails it removes the node from RIT and adds RS1 as the owner of R1 in master's memory. - Now when RS2 completes opening the region the master is not able to open the region as already the reigon has been deleted from RIT. {code} Master == 2011-10-05 20:49:45,301 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of linux146,60020,1317827727647 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not running balancer because 1 region(s) in transition: {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9. state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847} 2011-10-05 20:49:57,720 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=linux76,6,1317827742012, region=3e69d628a8bd8e9b7c5e7a2a6e03aad9 2011-10-05 20:50:14,501 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Deleting existing unassigned node for 3e69d628a8bd8e9b7c5e7a2a6e03aad9 that is in expected state RS_ZK_REGION_OPENED 2011-10-05 20:50:14,505 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x132d3dc13090023 Attempting to delete unassigned node 3e69d628a8bd8e9b7c5e7a2a6e03aad9 in RS_ZK_REGION_OPENED state but node is in RS_ZK_REGION_OPENING state After the region is opened in RS2 = 2011-10-05 20:50:48,066 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING,
[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter
[ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122161#comment-13122161 ] Liyin Tang commented on HBASE-4469: --- Yes, I didn't change that unit tests TestBlocksRead, which is passed successfully. Avoid top row seek by looking up bloomfilter Key: HBASE-4469 URL: https://issues.apache.org/jira/browse/HBASE-4469 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4282) Potential data loss in retries of WAL close introduced in HBASE-4222
[ https://issues.apache.org/jira/browse/HBASE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Helmling updated HBASE-4282: - Affects Version/s: 0.90.5 0.94.0 0.92.0 Status: Patch Available (was: Open) @Stack (or anyone else), can you take a look at the updated patch for trunk -- HBASE-4282_trunk_3.patch? Since HBASE-4487 was only applied to trunk, the previous version should still be applicable for 0.90/0.92. Potential data loss in retries of WAL close introduced in HBASE-4222 Key: HBASE-4282 URL: https://issues.apache.org/jira/browse/HBASE-4282 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0, 0.90.5 Reporter: Gary Helmling Assignee: Gary Helmling Priority: Blocker Fix For: 0.92.0, 0.90.5 Attachments: HBASE-4282_0.90_2.patch, HBASE-4282_trunk_2.patch, HBASE-4282_trunk_3.patch, HBASE-4282_trunk_prelim.patch The ability to ride over WAL close errors on log rolling added in HBASE-4222 could lead to missing HLog entries if: * A table has DEFERRED_LOG_FLUSH=true * There are unflushed WALEdit entries for that table in the current SequenceFile writer buffer Since the writes were already acknowledged to the client, just ignoring the close error to allow for another log roll doesn't seem like the right thing to do here. We could easily flag this state and only ride over the close error if there aren't unflushed entries. This would bring the above condition back to the previous behavior of aborting the region server. However, aborting the region server in this state is still guaranteeing data loss. Is there anything we can do better in this case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4546) Upgrade to ZooKeeper 3.3.2 or 3.3.3
Upgrade to ZooKeeper 3.3.2 or 3.3.3 --- Key: HBASE-4546 URL: https://issues.apache.org/jira/browse/HBASE-4546 Project: HBase Issue Type: Improvement Components: zookeeper Reporter: Jonathan Gray Assignee: Jonathan Gray Fix For: 0.92.0 HBase is still depending on 3.3.1. There many critical bug fixes in 3.3.2 and two more critical fixes in 3.3.3. We recently tripped on ZOOKEEPER-822 which was fixed in 3.3.2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4546) Upgrade to ZooKeeper 3.3.2 or 3.3.3
[ https://issues.apache.org/jira/browse/HBASE-4546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray resolved HBASE-4546. -- Resolution: Not A Problem Fix Version/s: (was: 0.92.0) Nevermind, I'm looking at a stale pom. We are already on 3.3.3 in 92 and trunk. Upgrade to ZooKeeper 3.3.2 or 3.3.3 --- Key: HBASE-4546 URL: https://issues.apache.org/jira/browse/HBASE-4546 Project: HBase Issue Type: Improvement Components: zookeeper Reporter: Jonathan Gray Assignee: Jonathan Gray HBase is still depending on 3.3.1. There many critical bug fixes in 3.3.2 and two more critical fixes in 3.3.3. We recently tripped on ZOOKEEPER-822 which was fixed in 3.3.2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4402) Retaining locality after restart broken
[ https://issues.apache.org/jira/browse/HBASE-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122207#comment-13122207 ] stack commented on HBASE-4402: -- Let me apply the updated patch. The below failures seem unrelated: {code} Failed tests: testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin) testForceSplit(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 {code} I saw them in a clean 0.92 run. I'm working on fixing these elsewhere. Retaining locality after restart broken --- Key: HBASE-4402 URL: https://issues.apache.org/jira/browse/HBASE-4402 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker Fix For: 0.92.0 Attachments: hbase-4402.txt, hbase-4402.txt In DefaultLoadBalancer, we implement the retain assignment function like so: {code} if (sn != null servers.contains(sn)) { assignments.get(sn).add(region.getKey()); {code} but this will never work since after a cluster restart, all servers have a new ServerName with a new startcode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4402) Retaining locality after restart broken
[ https://issues.apache.org/jira/browse/HBASE-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4402: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Applied to 0.92 branch and trunk. Thanks Todd. Retaining locality after restart broken --- Key: HBASE-4402 URL: https://issues.apache.org/jira/browse/HBASE-4402 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker Fix For: 0.92.0 Attachments: 4402-v3.txt, hbase-4402.txt, hbase-4402.txt In DefaultLoadBalancer, we implement the retain assignment function like so: {code} if (sn != null servers.contains(sn)) { assignments.get(sn).add(region.getKey()); {code} but this will never work since after a cluster restart, all servers have a new ServerName with a new startcode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter
[ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122232#comment-13122232 ] Nicolas Spiegelberg commented on HBASE-4469: +1. lgtm Avoid top row seek by looking up bloomfilter Key: HBASE-4469 URL: https://issues.apache.org/jira/browse/HBASE-4469 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4282) Potential data loss in retries of WAL close introduced in HBASE-4222
[ https://issues.apache.org/jira/browse/HBASE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122264#comment-13122264 ] Ted Yu commented on HBASE-4282: --- +1 on patch v3. There seems to be some missing javadoc for TestLogRollAbort. Please remove the following line in TestLogRollAbort: {code} / configuration for testLogRollOnDatanodeDeath / {code} Potential data loss in retries of WAL close introduced in HBASE-4222 Key: HBASE-4282 URL: https://issues.apache.org/jira/browse/HBASE-4282 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0, 0.90.5 Reporter: Gary Helmling Assignee: Gary Helmling Priority: Blocker Fix For: 0.92.0, 0.90.5 Attachments: HBASE-4282_0.90_2.patch, HBASE-4282_trunk_2.patch, HBASE-4282_trunk_3.patch, HBASE-4282_trunk_prelim.patch The ability to ride over WAL close errors on log rolling added in HBASE-4222 could lead to missing HLog entries if: * A table has DEFERRED_LOG_FLUSH=true * There are unflushed WALEdit entries for that table in the current SequenceFile writer buffer Since the writes were already acknowledged to the client, just ignoring the close error to allow for another log roll doesn't seem like the right thing to do here. We could easily flag this state and only ride over the close error if there aren't unflushed entries. This would bring the above condition back to the previous behavior of aborting the region server. However, aborting the region server in this state is still guaranteeing data loss. Is there anything we can do better in this case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4282) Potential data loss in retries of WAL close introduced in HBASE-4222
[ https://issues.apache.org/jira/browse/HBASE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122285#comment-13122285 ] Gary Helmling commented on HBASE-4282: -- Yes, bad copy-n-paste job on my part. I will clean up. Thanks for the review, Ted. Potential data loss in retries of WAL close introduced in HBASE-4222 Key: HBASE-4282 URL: https://issues.apache.org/jira/browse/HBASE-4282 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0, 0.90.5 Reporter: Gary Helmling Assignee: Gary Helmling Priority: Blocker Fix For: 0.92.0, 0.90.5 Attachments: HBASE-4282_0.90_2.patch, HBASE-4282_trunk_2.patch, HBASE-4282_trunk_3.patch, HBASE-4282_trunk_prelim.patch The ability to ride over WAL close errors on log rolling added in HBASE-4222 could lead to missing HLog entries if: * A table has DEFERRED_LOG_FLUSH=true * There are unflushed WALEdit entries for that table in the current SequenceFile writer buffer Since the writes were already acknowledged to the client, just ignoring the close error to allow for another log roll doesn't seem like the right thing to do here. We could easily flag this state and only ride over the close error if there aren't unflushed entries. This would bring the above condition back to the previous behavior of aborting the region server. However, aborting the region server in this state is still guaranteeing data loss. Is there anything we can do better in this case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter
[ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122289#comment-13122289 ] jirapos...@reviews.apache.org commented on HBASE-4469: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2235/#review2417 --- +1. Nice optimization Liyin. Changes look good. [This is running nicely on our internal branch.] - Kannan On 2011-10-06 17:17:23, Liyin wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2235/ bq. --- bq. bq. (Updated 2011-10-06 17:17:23) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). bq. However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. bq. We can take advantage of this factor to avoid seeking to the top of row. bq. bq. Also, Update the TestBlocksRead unit tests. since most of block read count has dropped to a lower number. bq. bq. Evaluation: bq. In TestSeekingOptimization, it saved 31.6% seek operation perviously. bq. Now it saves about 41.82% seek operation. bq. 10% more seek operation. bq. bq. == bq. Before this diff: bq. For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60% bq. bq. = bq. Apply this diff: bq. For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82% bq. = bq. bq. Thanks Mikhail and Kannan's help and discussion. bq. bq. bq. This addresses bug HBASE-4469. bq. https://issues.apache.org/jira/browse/HBASE-4469 bq. bq. bq. Diffs bq. - bq. bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 7b0b9e6 bq.src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 8dd8a68 bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java abccea4 bq. bq. Diff: https://reviews.apache.org/r/2235/diff bq. bq. bq. Testing bq. --- bq. bq. Run all the unit tests. bq. There are 2 unit tests failed with and without my change. bq. TestDistributedLogSplitting bq. TestHTablePool bq. bq. bq. Thanks, bq. bq. Liyin bq. bq. Avoid top row seek by looking up bloomfilter Key: HBASE-4469 URL: https://issues.apache.org/jira/browse/HBASE-4469 Project: HBase Issue Type: Improvement Reporter: Liyin Tang Assignee: Liyin Tang The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4536) Allow CF to retain deleted rows
[ https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-4536: - Fix Version/s: (was: 0.92.0) Turns out this is a bit more complicated than I thought. There are three types of deletes: # version deletes - effective for a specific version of a specific column # column deletes - effective for all versions of a specific column # family deletes - effective for all versions of all columns of a family The first two are sorted before the puts they affect based on their resp. timestamps, but after newer puts. Family deletes, always sort before all versions of all columns. The problems is deciding when the delete rows (the marker rows) themselves can be removed during a major compaction. For #1 and #2 I can just do version counting, and newer puts will eventually push out the delete markers from the store. With #3 this will never happen as they always sort before all puts of the same family, regardless of any timestamp set on them. Here it is necessary to scan all puts for that family and then decide whether the delete needs to be included based on whether the delete had any affect on any of the puts in the same family. Because of this, moving out of 0.92 as changes will be bigger. Put back if you think otherwise. I still think that timetravel is an important feature of HBase and incomplete if it cannot include deleted rows. Allow CF to retain deleted rows --- Key: HBASE-4536 URL: https://issues.apache.org/jira/browse/HBASE-4536 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Parent allows for a cluster to retain rows for a TTL or keep a minimum number of versions. However, if a client deletes a row all version older than the delete tomb stone will be remove at the next major compaction (and even at memstore flush - see HBASE-4241). There should be a way to retain those version to guard against software error. I see two options here: 1. Add a new flag HColumnDescriptor. Something like RETAIN_DELETED. 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of versions even past the delete marker. #1 would allow for more flexibility. #2 comes somewhat naturally with parent (from a user viewpoint) Comments? Any other options? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4547) TestAdmin failing in 0.92 because .tableinfo not found
TestAdmin failing in 0.92 because .tableinfo not found -- Key: HBASE-4547 URL: https://issues.apache.org/jira/browse/HBASE-4547 Project: HBase Issue Type: Bug Reporter: stack I've been running tests before commit and found the following happens with some regularity, sporadic of course, but they fail fairly frequently: {code} Failed tests: testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin) testForceSplit(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 {code} Looking, it seems like we fail to find .tableinfo in the tests that modify table schema while table is online. The update of a table schema just does an overwrite. In the tests we sometimes fail to find the newly written file or we get EOFE reading it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4547) TestAdmin failing in 0.92 because .tableinfo not found
[ https://issues.apache.org/jira/browse/HBASE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4547: - Attachment: 4547.txt This patch which does create in tmp dir, a delete and rename seems to fix the failing TestAdmin in repeated runs. TestAdmin failing in 0.92 because .tableinfo not found -- Key: HBASE-4547 URL: https://issues.apache.org/jira/browse/HBASE-4547 Project: HBase Issue Type: Bug Reporter: stack Attachments: 4547.txt I've been running tests before commit and found the following happens with some regularity, sporadic of course, but they fail fairly frequently: {code} Failed tests: testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin) testForceSplit(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 {code} Looking, it seems like we fail to find .tableinfo in the tests that modify table schema while table is online. The update of a table schema just does an overwrite. In the tests we sometimes fail to find the newly written file or we get EOFE reading it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows
[ https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122311#comment-13122311 ] Jonathan Gray commented on HBASE-4536: -- Lars, I agree that this is an important feature. Also agree that we should take time and do it right and not push for 0.92. Could we just support some kind of raw scanner along with a TTKAKV config (Time To Keep All Key Values)? Allow CF to retain deleted rows --- Key: HBASE-4536 URL: https://issues.apache.org/jira/browse/HBASE-4536 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Parent allows for a cluster to retain rows for a TTL or keep a minimum number of versions. However, if a client deletes a row all version older than the delete tomb stone will be remove at the next major compaction (and even at memstore flush - see HBASE-4241). There should be a way to retain those version to guard against software error. I see two options here: 1. Add a new flag HColumnDescriptor. Something like RETAIN_DELETED. 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of versions even past the delete marker. #1 would allow for more flexibility. #2 comes somewhat naturally with parent (from a user viewpoint) Comments? Any other options? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows
[ https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122322#comment-13122322 ] Lars Hofhansl commented on HBASE-4536: -- A simple option would be to only allow the KEEP_DELETED flag set when TTL is also set. Then we'd do simple version counting for #1 and #2 type deletes and rely on TTL to expire #3. (That means you could have more #3 delete markers than max versions, which would also be the case with TTKAKV. That might be acceptable). Allow CF to retain deleted rows --- Key: HBASE-4536 URL: https://issues.apache.org/jira/browse/HBASE-4536 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.94.0 Parent allows for a cluster to retain rows for a TTL or keep a minimum number of versions. However, if a client deletes a row all version older than the delete tomb stone will be remove at the next major compaction (and even at memstore flush - see HBASE-4241). There should be a way to retain those version to guard against software error. I see two options here: 1. Add a new flag HColumnDescriptor. Something like RETAIN_DELETED. 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of versions even past the delete marker. #1 would allow for more flexibility. #2 comes somewhat naturally with parent (from a user viewpoint) Comments? Any other options? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4282) Potential data loss in retries of WAL close introduced in HBASE-4222
[ https://issues.apache.org/jira/browse/HBASE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122326#comment-13122326 ] Gary Helmling commented on HBASE-4282: -- bq. On v3, the txids are pretty useless at least out in logs? No harm logging them I suppose but there is nothing I can infer given a txid? Is that so? Yes, txids are not so useful. I can drop them from the logs. I left them in as the analog of the previous version's deferred seqNum, which are moderately more useful. {code} -if (unflushedEntries.get() = syncedTillHere) { - Thread.sleep(this.optionalFlushInterval); -} +Thread.sleep(this.optionalFlushInterval); {code} This is reverting what I think is a dangerous change introduced by HBASE-4487. If the sync fails, then the if condition will be false, making the LogSyncer thread go into a hard loop until the sync succeeds. This is going to interfere with attempting to perform the log roll, so I think it at least needs to be throttled. The simplest change seemed to be restoring previous behavior. I can move this into a separate issue, if you think broader discussion would be good. {code} +TEST_UTIL.cleanupTestDir(); +TEST_UTIL.shutdownMiniCluster(); {code} cleanupTestDir() actually deletes the test directory in HDFS, so the cluster would need to be running for it. But shutdownMiniCluster() does it's own cleanup of the local FS dirs for testing, so I don't think we need the additional cleanupTestDir() at all. {code} +assertTrue(Need HDFS-826 for this test, log.canGetCurReplicas()); {code} Sure, I'll add that in. Potential data loss in retries of WAL close introduced in HBASE-4222 Key: HBASE-4282 URL: https://issues.apache.org/jira/browse/HBASE-4282 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0, 0.90.5 Reporter: Gary Helmling Assignee: Gary Helmling Priority: Blocker Fix For: 0.92.0, 0.90.5 Attachments: HBASE-4282_0.90_2.patch, HBASE-4282_trunk_2.patch, HBASE-4282_trunk_3.patch, HBASE-4282_trunk_prelim.patch The ability to ride over WAL close errors on log rolling added in HBASE-4222 could lead to missing HLog entries if: * A table has DEFERRED_LOG_FLUSH=true * There are unflushed WALEdit entries for that table in the current SequenceFile writer buffer Since the writes were already acknowledged to the client, just ignoring the close error to allow for another log roll doesn't seem like the right thing to do here. We could easily flag this state and only ride over the close error if there aren't unflushed entries. This would bring the above condition back to the previous behavior of aborting the region server. However, aborting the region server in this state is still guaranteeing data loss. Is there anything we can do better in this case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4482) Race Condition Concerning Eviction in SlabCache
[ https://issues.apache.org/jira/browse/HBASE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4482: - Attachment: hbase-4482v4.2.txt Here is what I applied copied from RB. Race Condition Concerning Eviction in SlabCache --- Key: HBASE-4482 URL: https://issues.apache.org/jira/browse/HBASE-4482 Project: HBase Issue Type: Sub-task Reporter: Li Pi Assignee: Li Pi Priority: Blocker Fix For: 0.92.0 Attachments: hbase-4482v1.txt, hbase-4482v2.txt, hbase-4482v4.2.txt, hbase-4482v4.2.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4482) Race Condition Concerning Eviction in SlabCache
[ https://issues.apache.org/jira/browse/HBASE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4482: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to 0.92 branch and trunk because of Ted and Jon +1s. Thanks for the patch Li Pi. Race Condition Concerning Eviction in SlabCache --- Key: HBASE-4482 URL: https://issues.apache.org/jira/browse/HBASE-4482 Project: HBase Issue Type: Sub-task Reporter: Li Pi Assignee: Li Pi Priority: Blocker Fix For: 0.92.0 Attachments: hbase-4482v1.txt, hbase-4482v2.txt, hbase-4482v4.2.txt, hbase-4482v4.2.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4430) Disable TestSlabCache and TestSingleSizedCache temporarily to see if these are cause of build box failure though all tests pass
[ https://issues.apache.org/jira/browse/HBASE-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-4430. -- Resolution: Fixed Marking resolved by hbase-4482; that patch reenabled these tests. Disable TestSlabCache and TestSingleSizedCache temporarily to see if these are cause of build box failure though all tests pass --- Key: HBASE-4430 URL: https://issues.apache.org/jira/browse/HBASE-4430 Project: HBase Issue Type: Task Components: test Reporter: stack Assignee: Li Pi Priority: Blocker Fix For: 0.92.0 Attachments: TestSlabCache.trace -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4282) Potential data loss in retries of WAL close introduced in HBASE-4222
[ https://issues.apache.org/jira/browse/HBASE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122337#comment-13122337 ] stack commented on HBASE-4282: -- I'm good on commit as is. I'd say open new issue to fix the 'dangerous change' in trunk. And you don't need to add in the check for hdfs-826. You already have it in there. Good stuff G. Potential data loss in retries of WAL close introduced in HBASE-4222 Key: HBASE-4282 URL: https://issues.apache.org/jira/browse/HBASE-4282 Project: HBase Issue Type: Bug Affects Versions: 0.92.0, 0.94.0, 0.90.5 Reporter: Gary Helmling Assignee: Gary Helmling Priority: Blocker Fix For: 0.92.0, 0.90.5 Attachments: HBASE-4282_0.90_2.patch, HBASE-4282_trunk_2.patch, HBASE-4282_trunk_3.patch, HBASE-4282_trunk_prelim.patch The ability to ride over WAL close errors on log rolling added in HBASE-4222 could lead to missing HLog entries if: * A table has DEFERRED_LOG_FLUSH=true * There are unflushed WALEdit entries for that table in the current SequenceFile writer buffer Since the writes were already acknowledged to the client, just ignoring the close error to allow for another log roll doesn't seem like the right thing to do here. We could easily flag this state and only ride over the close error if there aren't unflushed entries. This would bring the above condition back to the previous behavior of aborting the region server. However, aborting the region server in this state is still guaranteeing data loss. Is there anything we can do better in this case? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-1621) merge tool should work on online cluster, but disabled table
[ https://issues.apache.org/jira/browse/HBASE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-1621: - Priority: Major (was: Blocker) Undoing this as blocker now we have a merge script that has been run a few times in production; having such script takes the heat off the need for this... but we still need it. Marking major. merge tool should work on online cluster, but disabled table Key: HBASE-1621 URL: https://issues.apache.org/jira/browse/HBASE-1621 Project: HBase Issue Type: Bug Reporter: ryan rawson Assignee: stack Fix For: 0.92.0 Attachments: 1621-trunk.txt, HBASE-1621-v2.patch, HBASE-1621.patch, hbase-onlinemerge.patch, online_merge.rb taking down the entire cluster to merge 2 regions is a pain, i dont see why the table or regions specifically couldnt be taken offline, then merged then brought back up. this might need a new API to the regionservers so they can take direction from not just the master. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4547) TestAdmin failing in 0.92 because .tableinfo not found
[ https://issues.apache.org/jira/browse/HBASE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4547: - Priority: Critical (was: Major) Fix Version/s: 0.92.0 Assignee: stack Bringing into 0.92 and marking critical. TestAdmin failing in 0.92 because .tableinfo not found -- Key: HBASE-4547 URL: https://issues.apache.org/jira/browse/HBASE-4547 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Fix For: 0.92.0 Attachments: 4547.txt I've been running tests before commit and found the following happens with some regularity, sporadic of course, but they fail fairly frequently: {code} Failed tests: testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin) testForceSplit(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 {code} Looking, it seems like we fail to find .tableinfo in the tests that modify table schema while table is online. The update of a table schema just does an overwrite. In the tests we sometimes fail to find the newly written file or we get EOFE reading it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.
[ https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122344#comment-13122344 ] Jonathan Hsieh commented on HBASE-4377: --- In the 0.90 branch, after deleting meta and restarting the # of tables present is 0. In trunk and 0.92 branch, after deleting meta and restart the # of tables present is 1. This actually does make sense because HBASE-451 changed the behavior of HMaster -- in 0.90 (pre-HBASE-451) it HConnectionManager.listTables() loads table info on the client side via a meta scan. Post HBASE-451, table data from HConnectionManager.listTables() comes from the files system and is cached by the HMaster, and ignores the meta table. [hbck] Offline rebuild .META. from fs data only. Key: HBASE-4377 URL: https://issues.apache.org/jira/browse/HBASE-4377 Project: HBase Issue Type: New Feature Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, hbase-4377-trunk.v2.patch In a worst case situation, it may be helpful to have an offline .META. rebuilder that just looks at the file system's .regioninfos and rebuilds meta from scratch. Users could move bad regions out until there is a clean rebuild. It would likely fill in region split holes. Follow on work could given options to merge or select regions that overlap, or do online rebuilds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4547) TestAdmin failing in 0.92 because .tableinfo not found
[ https://issues.apache.org/jira/browse/HBASE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-4547. -- Resolution: Fixed Ran this a bunch of times and couldn't get TestAdmin to fail. Applied 0.92 branch and trunk. TestAdmin failing in 0.92 because .tableinfo not found -- Key: HBASE-4547 URL: https://issues.apache.org/jira/browse/HBASE-4547 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Fix For: 0.92.0 Attachments: 4547.txt I've been running tests before commit and found the following happens with some regularity, sporadic of course, but they fail fairly frequently: {code} Failed tests: testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin) testForceSplit(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 {code} Looking, it seems like we fail to find .tableinfo in the tests that modify table schema while table is online. The update of a table schema just does an overwrite. In the tests we sometimes fail to find the newly written file or we get EOFE reading it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.
[ https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122354#comment-13122354 ] Todd Lipcon commented on HBASE-4377: bq. Post HBASE-451, table data from HConnectionManager.listTables() comes from the files system and is cached by the HMaster, and ignores the meta table This seems like a bug - clients should never have to have direct access to HDFS! I filed HBASE-4548 [hbck] Offline rebuild .META. from fs data only. Key: HBASE-4377 URL: https://issues.apache.org/jira/browse/HBASE-4377 Project: HBase Issue Type: New Feature Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, hbase-4377-trunk.v2.patch In a worst case situation, it may be helpful to have an offline .META. rebuilder that just looks at the file system's .regioninfos and rebuilds meta from scratch. Users could move bad regions out until there is a clean rebuild. It would likely fill in region split holes. Follow on work could given options to merge or select regions that overlap, or do online rebuilds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4548) Client should not look on HDFS to list tables
Client should not look on HDFS to list tables - Key: HBASE-4548 URL: https://issues.apache.org/jira/browse/HBASE-4548 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 In HBASE-4377, Jon noticed that HConnectionManager.listTable now looks on HDFS for the table list. This seems incorrect, since the client may not have access to the hbase directory on HDFS (eg in a secure cluster). At the least, it should RPC to the master to find a table list, and have the master do the list on HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4549) Add thrift API to read version and build date of HBase
Add thrift API to read version and build date of HBase --- Key: HBASE-4549 URL: https://issues.apache.org/jira/browse/HBASE-4549 Project: HBase Issue Type: Improvement Components: thrift Reporter: Song Liu Priority: Minor Adding API to get the hbase server version and build date will be helpful for the client to communicate with different versions of the server accordingly. class VersionInfo can be reused to provide required information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4402) Retaining locality after restart broken
[ https://issues.apache.org/jira/browse/HBASE-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122364#comment-13122364 ] Hudson commented on HBASE-4402: --- Integrated in HBase-0.92 #48 (See [https://builds.apache.org/job/HBase-0.92/48/]) HBASE-4402 Retaining locality after restart broken stack : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/DefaultLoadBalancer.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestDefaultLoadBalancer.java Retaining locality after restart broken --- Key: HBASE-4402 URL: https://issues.apache.org/jira/browse/HBASE-4402 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker Fix For: 0.92.0 Attachments: 4402-v3.txt, hbase-4402.txt, hbase-4402.txt In DefaultLoadBalancer, we implement the retain assignment function like so: {code} if (sn != null servers.contains(sn)) { assignments.get(sn).add(region.getKey()); {code} but this will never work since after a cluster restart, all servers have a new ServerName with a new startcode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.
[ https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122367#comment-13122367 ] Jonathan Hsieh commented on HBASE-4377: --- @Todd, I think there is some confusion. Clients do not directly access hdfs. Let me add more detail. In trunk post HBASE-451, the HMaster reads and caches data from the file system (not the client). It then serves this the HTableDescriptors to the client rpc's via HConnectionManager to talk to the HMaster which just ships the cached HTD data. HMaster on initialization reads file system for HTD data. Client calls listTables() - HMaster (serve cached data from file system). Pre-HBASE-451, it the client HConnectionManager does a meta scan and builds HTableDescriptors. Client calls listTables() which actually is a metascan and that builds htds. [hbck] Offline rebuild .META. from fs data only. Key: HBASE-4377 URL: https://issues.apache.org/jira/browse/HBASE-4377 Project: HBase Issue Type: New Feature Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, hbase-4377-trunk.v2.patch In a worst case situation, it may be helpful to have an offline .META. rebuilder that just looks at the file system's .regioninfos and rebuilds meta from scratch. Users could move bad regions out until there is a clean rebuild. It would likely fill in region split holes. Follow on work could given options to merge or select regions that overlap, or do online rebuilds. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4548) Client should not look on HDFS to list tables
[ https://issues.apache.org/jira/browse/HBASE-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122369#comment-13122369 ] Jonathan Hsieh commented on HBASE-4548: --- @Todd, (also posted in HBASE-4377). I think there is some confusion. Clients do not directly access hdfs. Let me add more detail. In trunk post HBASE-451, the HMaster reads and caches data from the file system (not the client). It then serves this the HTableDescriptors to the client rpc's via HConnectionManager to talk to the HMaster which just ships the cached HTD data. HMaster on initialization reads file system for HTD data. Client calls listTables() - HMaster (serve cached data from file system). Pre-HBASE-451, it the client HConnectionManager does a meta scan and builds HTableDescriptors. Client calls listTables() which actually is a metascan and that builds htds. Client should not look on HDFS to list tables - Key: HBASE-4548 URL: https://issues.apache.org/jira/browse/HBASE-4548 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 In HBASE-4377, Jon noticed that HConnectionManager.listTable now looks on HDFS for the table list. This seems incorrect, since the client may not have access to the hbase directory on HDFS (eg in a secure cluster). At the least, it should RPC to the master to find a table list, and have the master do the list on HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4548) Client should not look on HDFS to list tables
[ https://issues.apache.org/jira/browse/HBASE-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh resolved HBASE-4548. --- Resolution: Not A Problem Client should not look on HDFS to list tables - Key: HBASE-4548 URL: https://issues.apache.org/jira/browse/HBASE-4548 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 In HBASE-4377, Jon noticed that HConnectionManager.listTable now looks on HDFS for the table list. This seems incorrect, since the client may not have access to the hbase directory on HDFS (eg in a secure cluster). At the least, it should RPC to the master to find a table list, and have the master do the list on HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4547) TestAdmin failing in 0.92 because .tableinfo not found
[ https://issues.apache.org/jira/browse/HBASE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122370#comment-13122370 ] Jonathan Gray commented on HBASE-4547: -- Post-commit +1. Stack, should we open another JIRA to deal with your TODO? TestAdmin failing in 0.92 because .tableinfo not found -- Key: HBASE-4547 URL: https://issues.apache.org/jira/browse/HBASE-4547 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Fix For: 0.92.0 Attachments: 4547.txt I've been running tests before commit and found the following happens with some regularity, sporadic of course, but they fail fairly frequently: {code} Failed tests: testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin) testForceSplit(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 {code} Looking, it seems like we fail to find .tableinfo in the tests that modify table schema while table is online. The update of a table schema just does an overwrite. In the tests we sometimes fail to find the newly written file or we get EOFE reading it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4549) Add thrift API to read version and build date of HBase
[ https://issues.apache.org/jira/browse/HBASE-4549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122373#comment-13122373 ] Jonathan Gray commented on HBASE-4549: -- +1 Add thrift API to read version and build date of HBase --- Key: HBASE-4549 URL: https://issues.apache.org/jira/browse/HBASE-4549 Project: HBase Issue Type: Improvement Components: thrift Reporter: Song Liu Priority: Minor Original Estimate: 2h Remaining Estimate: 2h Adding API to get the hbase server version and build date will be helpful for the client to communicate with different versions of the server accordingly. class VersionInfo can be reused to provide required information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4548) Client should not look on HDFS to list tables
[ https://issues.apache.org/jira/browse/HBASE-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122372#comment-13122372 ] Jonathan Hsieh commented on HBASE-4548: --- closed out as not a problem. Client should not look on HDFS to list tables - Key: HBASE-4548 URL: https://issues.apache.org/jira/browse/HBASE-4548 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 In HBASE-4377, Jon noticed that HConnectionManager.listTable now looks on HDFS for the table list. This seems incorrect, since the client may not have access to the hbase directory on HDFS (eg in a secure cluster). At the least, it should RPC to the master to find a table list, and have the master do the list on HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122417#comment-13122417 ] jirapos...@reviews.apache.org commented on HBASE-4528: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2141/#review2397 --- Overall I see this patch as trading off service resiliency in favor of performance. With the current ordering of operations (WAL append and sync prior to memstore insert), we ensure that an error during sync is seen by the client and memstore consistency is maintained. Importantly (at least for my goals), this also allows us to do some reasoning about when it's necessary to abort the region server or when we can take additional actions to try to ride over a transient error. As long as there were no deferred flush edits, we could reason that any error on sync was propagated back to the client as a failure and we did not need to abort yet. This is the direction I've been trying to move with HBASE-4222/4282 and a partial form of it was already in place prior to that. I understand why we want to reorder these operations and move the sync outside of the acquired row locks. From this standpoint, since an error on sync leaves the memstore polluted, aborting immediately is the right thing to do. But I don't think it's a desirable behavior. I think it will lead to more complaints from users about observed instability of the system. The use-case that motivated HBASE-4222 was performing a rolling restart of all DataNodes in a cluster, with a running, but completely quiescent HBase cluster. In this case, with no data durability at stake, we really should be able to recover. But instead what will happen is a catastrophic failure of RegionServers as each server tries to roll its HLog. The patch in it's current state would regress to this behavior, triggering RS aborts even more quickly than prior to HBASE-4222 (no HLog close would be attempted). I would really like to find a way to keep the performance optimization of moving the HLog sync outside of the row locks, while still being able to guarantee memstore consistency in the case of failure, so that we can still reason about whether or not a RS abort is really necessary. Speaking naively, is it at all feasible that the RWCC.WriteEntry could track the KeyValues instances it's used to apply to the memstore? And these references could then be used to attempt a memstore rollback on failure? Any other ways that we can maintain memstore consistency here without giving up and aborting? /src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java https://reviews.apache.org/r/2141/#comment5501 Personally, I think this is a step in the wrong direction. I would like to see us be _more_ resilient in the face of transient HDFS errors, as long as we have sufficient information to reason that we have not compromised correctness. - Gary On 2011-10-06 08:08:49, Dhruba Borthakur wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2141/ bq. --- bq. bq. (Updated 2011-10-06 08:08:49) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. The changes the multiPut operation so that the sync to the wal occurs outside the rowlock. bq. bq. This enhancement is done only to HRegion.mut(Put[]) because this is the only method that gets invoked from an application. The HRegion.put(Put) is used only by unit tests and should possibly be deprecated. bq. bq. I have attached a unit test. I have not yet run all unit tests, but early feedback on this patch will be very helpful. bq. bq. bq. This addresses bug HBASE-4528. bq. https://issues.apache.org/jira/browse/HBASE-4528 bq. bq. bq. Diffs bq. - bq. bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1179529 bq. /src/main/java/org/apache/hadoop/hbase/regionserver/ReadWriteConsistencyControl.java 1179529 bq./src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1179529 bq./src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlusher.java 1179529 bq./src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 1179529 bq./src/test/java/org/apache/hadoop/hbase/regionserver/TestParallelPut.java PRE-CREATION bq./src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1179529 bq. bq. Diff: https://reviews.apache.org/r/2141/diff bq. bq. bq. Testing bq. --- bq. bq. Not yet run the full suite of unit tests. bq. bq. bq. Thanks, bq. bq. Dhruba bq.
[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122433#comment-13122433 ] Ted Yu commented on HBASE-4528: --- I was thinking about the possibility of memstore rollback as well. Here're the operations in applyFamilyMapToMemstore() whose effect needs to be rolled back: {code} for (KeyValue kv: edits) { kv.setMemstoreTS(w.getWriteNumber()); size += store.add(kv); } {code} The put operation can release the rowlock before sync-ing the Hlog -- Key: HBASE-4528 URL: https://issues.apache.org/jira/browse/HBASE-4528 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: appendNoSyncPut1.txt, appendNoSyncPut2.txt, appendNoSyncPut3.txt This allows for better throughput when there are hot rows. A single row update improves from 100 puts/sec/server to 5000 puts/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122436#comment-13122436 ] Jonathan Gray commented on HBASE-4528: -- Dhruba and I just talked about this. I also like the MemStore rollback. It should not be that difficult, just removing the ListKV that we added. The put operation can release the rowlock before sync-ing the Hlog -- Key: HBASE-4528 URL: https://issues.apache.org/jira/browse/HBASE-4528 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: appendNoSyncPut1.txt, appendNoSyncPut2.txt, appendNoSyncPut3.txt This allows for better throughput when there are hot rows. A single row update improves from 100 puts/sec/server to 5000 puts/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4482) Race Condition Concerning Eviction in SlabCache
[ https://issues.apache.org/jira/browse/HBASE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122447#comment-13122447 ] Hudson commented on HBASE-4482: --- Integrated in HBase-0.92 #49 (See [https://builds.apache.org/job/HBase-0.92/49/]) HBASE-4482 Race Condition Concerning Eviction in SlabCache stack : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SingleSizeCache.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabCache.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/slab/SlabItemEvictionWatcher.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSingleSizeCache.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/io/hfile/slab/TestSlabCache.java Race Condition Concerning Eviction in SlabCache --- Key: HBASE-4482 URL: https://issues.apache.org/jira/browse/HBASE-4482 Project: HBase Issue Type: Sub-task Reporter: Li Pi Assignee: Li Pi Priority: Blocker Fix For: 0.92.0 Attachments: hbase-4482v1.txt, hbase-4482v2.txt, hbase-4482v4.2.txt, hbase-4482v4.2.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4547) TestAdmin failing in 0.92 because .tableinfo not found
[ https://issues.apache.org/jira/browse/HBASE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122446#comment-13122446 ] Hudson commented on HBASE-4547: --- Integrated in HBase-0.92 #49 (See [https://builds.apache.org/job/HBase-0.92/49/]) HBASE-4547 TestAdmin failing in 0.92 because .tableinfo not found stack : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java TestAdmin failing in 0.92 because .tableinfo not found -- Key: HBASE-4547 URL: https://issues.apache.org/jira/browse/HBASE-4547 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Fix For: 0.92.0 Attachments: 4547.txt I've been running tests before commit and found the following happens with some regularity, sporadic of course, but they fail fairly frequently: {code} Failed tests: testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin) testForceSplit(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): expected:2 but was:1 {code} Looking, it seems like we fail to find .tableinfo in the tests that modify table schema while table is online. The update of a table schema just does an overwrite. In the tests we sometimes fail to find the newly written file or we get EOFE reading it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4480) Testing script to simplfy local testing
[ https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Kuehn updated HBASE-4480: --- Attachment: runtest2.sh Testing script to simplfy local testing --- Key: HBASE-4480 URL: https://issues.apache.org/jira/browse/HBASE-4480 Project: HBase Issue Type: Improvement Reporter: Jesse Yates Priority: Minor Labels: test Attachments: runtest.sh, runtest2.sh As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a script that would handle more of the finer points of running/checking our test suite. This script should: (1) Allow people to determine which tests are hanging/taking a long time to run (2) Allow rerunning of particular tests to make sure it wasn't an artifact of running the whole suite that caused the failure (3) Allow people to specify to run just unit tests or also integration tests (essentially wrapping calls to 'maven test' and 'maven verify'). This script should just be a convenience script - running tests directly from maven should not be impacted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing
[ https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122449#comment-13122449 ] Scott Kuehn commented on HBASE-4480: @Jesse, @Ted - The script has been extended with the features: print slow/hanging tests, read test names from a file, select unit or unit+integration tests. usage: {code} usage: ./runtest2.sh [options] [test-name...] Run a set of tests. Individual tests may be specified on the command line or in a file specified by -f=FILE, containing one test per line. Runs all tests by default. options: -h Show this message -f=FILE Run the tests listed in the FILE -u Only run unit tests. Default is to run unit and integration tests -n=NRun each test N times. Default = 1. -s=NPrint N slowest tests -H Print which tests are hanging (if any) {code} Testing script to simplfy local testing --- Key: HBASE-4480 URL: https://issues.apache.org/jira/browse/HBASE-4480 Project: HBase Issue Type: Improvement Reporter: Jesse Yates Priority: Minor Labels: test Attachments: runtest.sh, runtest2.sh As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a script that would handle more of the finer points of running/checking our test suite. This script should: (1) Allow people to determine which tests are hanging/taking a long time to run (2) Allow rerunning of particular tests to make sure it wasn't an artifact of running the whole suite that caused the failure (3) Allow people to specify to run just unit tests or also integration tests (essentially wrapping calls to 'maven test' and 'maven verify'). This script should just be a convenience script - running tests directly from maven should not be impacted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing
[ https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122475#comment-13122475 ] Jesse Yates commented on HBASE-4480: @Scott - awesome, thanks! I'm gonna go play with it tonight. Testing script to simplfy local testing --- Key: HBASE-4480 URL: https://issues.apache.org/jira/browse/HBASE-4480 Project: HBase Issue Type: Improvement Reporter: Jesse Yates Priority: Minor Labels: test Attachments: runtest.sh, runtest2.sh As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a script that would handle more of the finer points of running/checking our test suite. This script should: (1) Allow people to determine which tests are hanging/taking a long time to run (2) Allow rerunning of particular tests to make sure it wasn't an artifact of running the whole suite that caused the failure (3) Allow people to specify to run just unit tests or also integration tests (essentially wrapping calls to 'maven test' and 'maven verify'). This script should just be a convenience script - running tests directly from maven should not be impacted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4488) Store could miss rows during flush
[ https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122482#comment-13122482 ] Lars Hofhansl commented on HBASE-4488: -- I see that Store.compactStore does the same thing. The same reasoning goes there, that currently we are lucky that StoreScanner.next() never returns false when more rows are waiting. There's even a comment about a do/while loop, but then it's just a while loop. {code} // since scanner.next() can return 'false' but still be delivering data, // we have to use a do/while loop. ArrayListKeyValue kvs = new ArrayListKeyValue(); // Limit to hbase.hstore.compaction.kv.max (default 10) to avoid OOME while (scanner.next(kvs,this.compactionKVMax)) { {code} Looking at the history of the file this has been like this forever. This is a bug waiting to happen. Should we have another patch with this one, or a separate jira? Store could miss rows during flush -- Key: HBASE-4488 URL: https://issues.apache.org/jira/browse/HBASE-4488 Project: HBase Issue Type: Sub-task Components: regionserver Affects Versions: 0.92.0, 0.94.0 Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.92.0 Attachments: 4488.txt While looking at HBASE-4344 I found that my change HBASE-4241 contains a critical mistake: The while(scanner.next(kvs)) loop is incorrect and might miss the last edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4333) Client does not check for holes in .META.
[ https://issues.apache.org/jira/browse/HBASE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122496#comment-13122496 ] Lars Hofhansl commented on HBASE-4333: -- I would prefer to have a log message on the server, rather than silently (from the viewpoint of the server logs) ignoring holes on the client. With HBASE-4334 in place I propose closing this. Client does not check for holes in .META. - Key: HBASE-4333 URL: https://issues.apache.org/jira/browse/HBASE-4333 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Joe Pallas If there is a temporary hole in .META., the client may get the wrong region from HConnection.locateRegion. HConnectionManager.HConnectionImplementation.locateRegionInMeta should check the end key of the region found with getClosestRowBefore, just as it checks the offline status, when it looks at the region info. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4462) Properly treating SocketTimeoutException
[ https://issues.apache.org/jira/browse/HBASE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122515#comment-13122515 ] Lars Hofhansl commented on HBASE-4462: -- So this problem is fixed in 0.92 it seems. If so, Fix Version should be 0.90, so that we get a better idea what's still missing for 0.92. Properly treating SocketTimeoutException Key: HBASE-4462 URL: https://issues.apache.org/jira/browse/HBASE-4462 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Fix For: 0.92.0 SocketTimeoutException is currently treated like any IOE inside of HCM.getRegionServerWithRetries and I think this is a problem. This method should only do retries in cases where we are pretty sure the operation will complete, but with STE we already waited for (by default) 60 seconds and nothing happened. I found this while debugging Douglas Campbell's problem on the mailing list where it seemed like he was using the same scanner from multiple threads, but actually it was just the same client doing retries while the first run didn't even finish yet (that's another problem). You could see the first scanner, then up to two other handlers waiting for it to finish in order to run (because of the synchronization on RegionScanner). So what should we do? We could treat STE as a DoNotRetryException and let the client deal with it, or we could retry only once. There's also the option of having a different behavior for get/put/icv/scan, the issue with operations that modify a cell is that you don't know if the operation completed or not (same when a RS dies hard after completing let's say a Put but just before returning to the client). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4550) When master passed regionserver different address , because regionserver didn't create new zookeeper znode, as a result stop-hbase.sh is hang
When master passed regionserver different address , because regionserver didn't create new zookeeper znode, as a result stop-hbase.sh is hang --- Key: HBASE-4550 URL: https://issues.apache.org/jira/browse/HBASE-4550 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.3 Reporter: wanbin Fix For: 0.90.4 when master passed regionserver different address, regionserver didn't create new zookeeper znode, master store new address in ServerManager, when call stop-hbase.sh , RegionServerTracker.nodeDeleted received path is old address, serverManager.expireServer is not be called. so stop-hbase.sh is hang. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4551) Small fixes to compile against 0.23-SNAPSHOT
[ https://issues.apache.org/jira/browse/HBASE-4551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-4551: --- Attachment: hbase-4551.txt Small fixes to compile against 0.23-SNAPSHOT Key: HBASE-4551 URL: https://issues.apache.org/jira/browse/HBASE-4551 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.92.0 Attachments: hbase-4551.txt - fix pom.xml to properly pull the test artifacts - fix TestHLog to not use the private cluster.getNameNode() API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4551) Small fixes to compile against 0.23-SNAPSHOT
Small fixes to compile against 0.23-SNAPSHOT Key: HBASE-4551 URL: https://issues.apache.org/jira/browse/HBASE-4551 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.92.0 Attachments: hbase-4551.txt - fix pom.xml to properly pull the test artifacts - fix TestHLog to not use the private cluster.getNameNode() API -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog
[ https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122540#comment-13122540 ] Kannan Muthukkaruppan commented on HBASE-4528: -- Dhruba: You wrote: A single row update improves from 100 puts/sec/server to 5000 puts/sec/server.. Can you reconfirm the above? On hbase-89 based test, I am seeing single row update do ~1000 puts/sec/server. The put operation can release the rowlock before sync-ing the Hlog -- Key: HBASE-4528 URL: https://issues.apache.org/jira/browse/HBASE-4528 Project: HBase Issue Type: Improvement Components: regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: appendNoSyncPut1.txt, appendNoSyncPut2.txt, appendNoSyncPut3.txt This allows for better throughput when there are hot rows. A single row update improves from 100 puts/sec/server to 5000 puts/sec/server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4552) multi-CF bulk load is not atomic across column families
multi-CF bulk load is not atomic across column families --- Key: HBASE-4552 URL: https://issues.apache.org/jira/browse/HBASE-4552 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Fix For: 0.92.0 Currently the bulk load API simply imports one HFile at a time. With multi-column-family support, this is inappropriate, since different CFs show up separately. Instead, the IPC endpoint should take a of CF - HFiles, so we can online them all under a single region-wide lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira