[jira] [Commented] (HBASE-4964) Add builddate, make less sections in toc, and add header and footer customizations
[ https://issues.apache.org/jira/browse/HBASE-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163463#comment-13163463 ] Hudson commented on HBASE-4964: --- Integrated in HBase-TRUNK #2521 (See [https://builds.apache.org/job/HBase-TRUNK/2521/]) HBASE-4964 Add builddate, make less sections in toc, and add header and footer customizations stack : Files : * /hbase/trunk/pom.xml * /hbase/trunk/src/docbkx/book.xml * /hbase/trunk/src/docbkx/customization.xsl Add builddate, make less sections in toc, and add header and footer customizations -- Key: HBASE-4964 URL: https://issues.apache.org/jira/browse/HBASE-4964 Project: HBase Issue Type: Improvement Reporter: stack Fix For: 0.94.0 Attachments: 4964.txt The customizations are for adding facebook comments. I tried it but not working for me immediately; need some xsl jujitsu so I can get name of current page into the current footer. Added a buildDate define in iso-8601 to the pom used in 'reference guide' -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4908: -- Status: Patch Available (was: Open) HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4908: -- Status: Open (was: Patch Available) HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4965) Monitor the open file descriptors and the threads counters during the unit tests
Monitor the open file descriptors and the threads counters during the unit tests Key: HBASE-4965 URL: https://issues.apache.org/jira/browse/HBASE-4965 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor We're seeing a lot of issues with hadoop-qa related to threads or file descriptors. Monitoring these counters would ease the analysis. Note as well that - if we want to execute the tests in the same jvm (because the test is small or because we want to share the cluster) we can't afford to leak too many resources - if the tests leak, it's more difficult to detect a leak in the software itself. I attach piece of code that I used. It requires two lines in a unit test class to: - before every test, count the threads and the open file descriptor - after every test, compare with the previous value. I ran it on some tests; we have for example: - client.TestMultiParallel#testBatchWithManyColsInOneRowGetAndPut: 232 threads (was 231), 390 file descriptors (was 390). = TestMultiParallel uses 232 threads! - client.TestMultipleTimestamps#testWithColumnDeletes: 152 threads (was 151), 283 file descriptors (was 282). - client.TestAdmin#testCheckHBaseAvailableClosesConnection: 477 threads (was 294), 815 file descriptors (was 461) - client.TestMetaMigrationRemovingHTD#testMetaMigration: 149 threads (was 148), 310 file descriptors (was 307). It's not always leaks, we can expect some pooling effects. But still... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4927) CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty
[ https://issues.apache.org/jira/browse/HBASE-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-4927: --- Status: Open (was: Patch Available) CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty --- Key: HBASE-4927 URL: https://issues.apache.org/jira/browse/HBASE-4927 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.94.0 Attachments: 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator-.patch When reviewing HBASE-4238 backporting, Jon found this issue. What happens if the split points are (empty end key is the last key, empty start key is the first key) Parent [A,) L daughter [A,B), R daughter [B,) When sorted, we gets to end key comparision which results in this incorrector order: [A,B), [A,), [B,) we wanted: [A,), [A,B), [B,) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4927) CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty
[ https://issues.apache.org/jira/browse/HBASE-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-4927: --- Status: Patch Available (was: Open) 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator_v2.patch is the latest patch. It replaced tabs with spaces. CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty --- Key: HBASE-4927 URL: https://issues.apache.org/jira/browse/HBASE-4927 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.94.0 Attachments: 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator-.patch, 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator_v2.patch When reviewing HBASE-4238 backporting, Jon found this issue. What happens if the split points are (empty end key is the last key, empty start key is the first key) Parent [A,) L daughter [A,B), R daughter [B,) When sorted, we gets to end key comparision which results in this incorrector order: [A,B), [A,), [B,) we wanted: [A,), [A,B), [B,) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4927) CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty
[ https://issues.apache.org/jira/browse/HBASE-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-4927: --- Attachment: 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator_v2.patch CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty --- Key: HBASE-4927 URL: https://issues.apache.org/jira/browse/HBASE-4927 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.94.0 Attachments: 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator-.patch, 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator_v2.patch When reviewing HBASE-4238 backporting, Jon found this issue. What happens if the split points are (empty end key is the last key, empty start key is the first key) Parent [A,) L daughter [A,B), R daughter [B,) When sorted, we gets to end key comparision which results in this incorrector order: [A,B), [A,), [B,) we wanted: [A,), [A,B), [B,) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4966) Put/Delete values cannot be tested with MRUnit
Put/Delete values cannot be tested with MRUnit -- Key: HBASE-4966 URL: https://issues.apache.org/jira/browse/HBASE-4966 Project: HBase Issue Type: Bug Components: client, mapreduce Affects Versions: 0.90.4 Reporter: Nicholas Telford Priority: Minor When using the IdentityTableReducer, which expects input values of either a Put or Delete object, testing with MRUnit the Mapper with MRUnit is not possible because neither Put nor Delete implement equals(). We should implement equals() on both such that equality means: * Both objects are of the same class (in this case, Put or Delete) * Both objects are for the same key. * Both objects contain an equal set of KeyValues (applicable only to Put) KeyValue.equals() appears to already be implemented, but only checks for equality of row key, column family and column qualifier - two KeyValues can be considered equal if they contain different values. This won't work for testing. Instead, the Put.equals() and Delete.equals() implementations should do a deep equality check on their KeyValues, like this: myKv.equals(theirKv) Bytes.equals(myKv.getValue(), theirKv.getValue()); NOTE: This would impact any code that relies on the existing identity implementation of Put.equals() and Delete.equals(), therefore cannot be guaranteed to be backwards-compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4966) Put/Delete values cannot be tested with MRUnit
[ https://issues.apache.org/jira/browse/HBASE-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Telford updated HBASE-4966: Description: When using the IdentityTableReducer, which expects input values of either a Put or Delete object, testing with MRUnit the Mapper with MRUnit is not possible because neither Put nor Delete implement equals(). We should implement equals() on both such that equality means: * Both objects are of the same class (in this case, Put or Delete) * Both objects are for the same key. * Both objects contain an equal set of KeyValues (applicable only to Put) KeyValue.equals() appears to already be implemented, but only checks for equality of row key, column family and column qualifier - two KeyValues can be considered equal if they contain different values. This won't work for testing. Instead, the Put.equals() and Delete.equals() implementations should do a deep equality check on their KeyValues, like this: {code:java} myKv.equals(theirKv) Bytes.equals(myKv.getValue(), theirKv.getValue()); {code} NOTE: This would impact any code that relies on the existing identity implementation of Put.equals() and Delete.equals(), therefore cannot be guaranteed to be backwards-compatible. was: When using the IdentityTableReducer, which expects input values of either a Put or Delete object, testing with MRUnit the Mapper with MRUnit is not possible because neither Put nor Delete implement equals(). We should implement equals() on both such that equality means: * Both objects are of the same class (in this case, Put or Delete) * Both objects are for the same key. * Both objects contain an equal set of KeyValues (applicable only to Put) KeyValue.equals() appears to already be implemented, but only checks for equality of row key, column family and column qualifier - two KeyValues can be considered equal if they contain different values. This won't work for testing. Instead, the Put.equals() and Delete.equals() implementations should do a deep equality check on their KeyValues, like this: myKv.equals(theirKv) Bytes.equals(myKv.getValue(), theirKv.getValue()); NOTE: This would impact any code that relies on the existing identity implementation of Put.equals() and Delete.equals(), therefore cannot be guaranteed to be backwards-compatible. Put/Delete values cannot be tested with MRUnit -- Key: HBASE-4966 URL: https://issues.apache.org/jira/browse/HBASE-4966 Project: HBase Issue Type: Bug Components: client, mapreduce Affects Versions: 0.90.4 Reporter: Nicholas Telford Priority: Minor When using the IdentityTableReducer, which expects input values of either a Put or Delete object, testing with MRUnit the Mapper with MRUnit is not possible because neither Put nor Delete implement equals(). We should implement equals() on both such that equality means: * Both objects are of the same class (in this case, Put or Delete) * Both objects are for the same key. * Both objects contain an equal set of KeyValues (applicable only to Put) KeyValue.equals() appears to already be implemented, but only checks for equality of row key, column family and column qualifier - two KeyValues can be considered equal if they contain different values. This won't work for testing. Instead, the Put.equals() and Delete.equals() implementations should do a deep equality check on their KeyValues, like this: {code:java} myKv.equals(theirKv) Bytes.equals(myKv.getValue(), theirKv.getValue()); {code} NOTE: This would impact any code that relies on the existing identity implementation of Put.equals() and Delete.equals(), therefore cannot be guaranteed to be backwards-compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4967) connected client thrift sockets should have a server side read timeout
connected client thrift sockets should have a server side read timeout -- Key: HBASE-4967 URL: https://issues.apache.org/jira/browse/HBASE-4967 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani If there is no socket read timeout and if the Thrift server is a ThreadPoolServer then server side threads will be used up waiting for dead unresponsive clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4967) connected client thrift sockets should have a server side read timeout
[ https://issues.apache.org/jira/browse/HBASE-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakash Khemani reassigned HBASE-4967: -- Assignee: Prakash Khemani connected client thrift sockets should have a server side read timeout -- Key: HBASE-4967 URL: https://issues.apache.org/jira/browse/HBASE-4967 Project: HBase Issue Type: Improvement Reporter: Prakash Khemani Assignee: Prakash Khemani If there is no socket read timeout and if the Thrift server is a ThreadPoolServer then server side threads will be used up waiting for dead unresponsive clients. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4966) Put/Delete values cannot be tested with MRUnit
[ https://issues.apache.org/jira/browse/HBASE-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-4966: --- Hadoop Flags: Incompatible change Put/Delete values cannot be tested with MRUnit -- Key: HBASE-4966 URL: https://issues.apache.org/jira/browse/HBASE-4966 Project: HBase Issue Type: Bug Components: client, mapreduce Affects Versions: 0.90.4 Reporter: Nicholas Telford Priority: Minor When using the IdentityTableReducer, which expects input values of either a Put or Delete object, testing with MRUnit the Mapper with MRUnit is not possible because neither Put nor Delete implement equals(). We should implement equals() on both such that equality means: * Both objects are of the same class (in this case, Put or Delete) * Both objects are for the same key. * Both objects contain an equal set of KeyValues (applicable only to Put) KeyValue.equals() appears to already be implemented, but only checks for equality of row key, column family and column qualifier - two KeyValues can be considered equal if they contain different values. This won't work for testing. Instead, the Put.equals() and Delete.equals() implementations should do a deep equality check on their KeyValues, like this: {code:java} myKv.equals(theirKv) Bytes.equals(myKv.getValue(), theirKv.getValue()); {code} NOTE: This would impact any code that relies on the existing identity implementation of Put.equals() and Delete.equals(), therefore cannot be guaranteed to be backwards-compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4927) CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty
[ https://issues.apache.org/jira/browse/HBASE-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-4927: --- Attachment: hbase-4927-fix-ws.txt there were a few more hard tabs in the tests, and a couple places with 4 space indentation instead of 2. here's a fixed up version. CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty --- Key: HBASE-4927 URL: https://issues.apache.org/jira/browse/HBASE-4927 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.94.0 Attachments: 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator-.patch, 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator_v2.patch, hbase-4927-fix-ws.txt When reviewing HBASE-4238 backporting, Jon found this issue. What happens if the split points are (empty end key is the last key, empty start key is the first key) Parent [A,) L daughter [A,B), R daughter [B,) When sorted, we gets to end key comparision which results in this incorrector order: [A,B), [A,), [B,) we wanted: [A,), [A,B), [B,) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4927) CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty
[ https://issues.apache.org/jira/browse/HBASE-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-4927: --- Status: Open (was: Patch Available) CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty --- Key: HBASE-4927 URL: https://issues.apache.org/jira/browse/HBASE-4927 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.94.0 Attachments: 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator-.patch, 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator_v2.patch, hbase-4927-fix-ws.txt When reviewing HBASE-4238 backporting, Jon found this issue. What happens if the split points are (empty end key is the last key, empty start key is the first key) Parent [A,) L daughter [A,B), R daughter [B,) When sorted, we gets to end key comparision which results in this incorrector order: [A,B), [A,), [B,) we wanted: [A,), [A,B), [B,) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4927) CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty
[ https://issues.apache.org/jira/browse/HBASE-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HBASE-4927: --- Status: Patch Available (was: Open) CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty --- Key: HBASE-4927 URL: https://issues.apache.org/jira/browse/HBASE-4927 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.94.0 Attachments: 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator-.patch, 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator_v2.patch, hbase-4927-fix-ws.txt When reviewing HBASE-4238 backporting, Jon found this issue. What happens if the split points are (empty end key is the last key, empty start key is the first key) Parent [A,) L daughter [A,B), R daughter [B,) When sorted, we gets to end key comparision which results in this incorrector order: [A,B), [A,), [B,) we wanted: [A,), [A,B), [B,) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4109) Hostname returned via reverse dns lookup contains trailing period if configured interface is not default
[ https://issues.apache.org/jira/browse/HBASE-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans reassigned HBASE-4109: - Assignee: Shrijeet Paliwal Hostname returned via reverse dns lookup contains trailing period if configured interface is not default -- Key: HBASE-4109 URL: https://issues.apache.org/jira/browse/HBASE-4109 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.90.3 Reporter: Shrijeet Paliwal Assignee: Shrijeet Paliwal Fix For: 0.90.4 Attachments: 0001-HBASE-4109-Sanitize-hostname-returned-from-DNS-class.patch If you are using an interface anything other than 'default' (literally that keyword) DNS.java 's getDefaultHost will return a string which will have a trailing period at the end. It seems javadoc of reverseDns in DNS.java (see below) is conflicting with what that function is actually doing. It is returning a PTR record while claims it returns a hostname. The PTR record always has period at the end , RFC: http://irbs.net/bog-4.9.5/bog47.html We make call to DNS.getDefaultHost at more than one places and treat that as actual hostname. Quoting HRegionServer for example {code} String machineName = DNS.getDefaultHost(conf.get( hbase.regionserver.dns.interface, default), conf.get( hbase.regionserver.dns.nameserver, default)); {code} This causes inconsistencies. An example of such inconsistency was observed while debugging the issue Regions not getting reassigned if RS is brought down. More here http://search-hadoop.com/m/CANUA1qRCkQ1 We may want to sanitize the string returned from DNS class. Or better we can take a path of overhauling the way we do DNS name matching all over. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4927) CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty
[ https://issues.apache.org/jira/browse/HBASE-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163765#comment-13163765 ] Hadoop QA commented on HBASE-4927: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12506284/0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator_v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -160 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 71 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildHole org.apache.hadoop.hbase.util.TestFSUtils org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/453//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/453//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/453//console This message is automatically generated. CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty --- Key: HBASE-4927 URL: https://issues.apache.org/jira/browse/HBASE-4927 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.94.0 Attachments: 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator-.patch, 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator_v2.patch, hbase-4927-fix-ws.txt When reviewing HBASE-4238 backporting, Jon found this issue. What happens if the split points are (empty end key is the last key, empty start key is the first key) Parent [A,) L daughter [A,B), R daughter [B,) When sorted, we gets to end key comparision which results in this incorrector order: [A,B), [A,), [B,) we wanted: [A,), [A,B), [B,) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4927) CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty
[ https://issues.apache.org/jira/browse/HBASE-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163785#comment-13163785 ] Hadoop QA commented on HBASE-4927: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12506287/hbase-4927-fix-ws.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -160 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 71 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildHole org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildOverlap org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/454//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/454//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/454//console This message is automatically generated. CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty --- Key: HBASE-4927 URL: https://issues.apache.org/jira/browse/HBASE-4927 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.94.0 Attachments: 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator-.patch, 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator_v2.patch, hbase-4927-fix-ws.txt When reviewing HBASE-4238 backporting, Jon found this issue. What happens if the split points are (empty end key is the last key, empty start key is the first key) Parent [A,) L daughter [A,B), R daughter [B,) When sorted, we gets to end key comparision which results in this incorrector order: [A,B), [A,), [B,) we wanted: [A,), [A,B), [B,) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4946) HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to d
[ https://issues.apache.org/jira/browse/HBASE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163819#comment-13163819 ] Ted Yu commented on HBASE-4946: --- The failed tests were due to 'unable to create new native thread' HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to deserialize an unknown class. - Key: HBASE-4946 URL: https://issues.apache.org/jira/browse/HBASE-4946 Project: HBase Issue Type: Bug Components: coprocessors Affects Versions: 0.92.0 Reporter: Andrei Dragomir Assignee: Andrei Dragomir Attachments: HBASE-4946-v2.patch, HBASE-4946-v3.patch, HBASE-4946.patch Loading coprocessors jars from hdfs works fine. I load it from the shell, after setting the attribute, and it gets loaded: {noformat} INFO org.apache.hadoop.hbase.regionserver.HRegion: Setting up tabledescriptor config now ... INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Class com.MyCoprocessorClass needs to be loaded from a file - hdfs://localhost:9000/coproc/rt- 0.0.1-SNAPSHOT.jar. INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: loadInstance: com.MyCoprocessorClass INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: RegionEnvironment createEnvironment DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Registered protocol handler: region=t1,,1322572939753.6409aee1726d31f5e5671a59fe6e384f. protocol=com.MyCoprocessorClassProtocol INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Load coprocessor com.MyCoprocessorClass from HTD of t1 successfully. {noformat} The problem is that this coprocessors simply extends BaseEndpointCoprocessor, with a dynamic method. When calling this method from the client with HTable.coprocessorExec, I get errors on the HRegionServer, because the call cannot be deserialized from writables. The problem is that Exec tries to do an early resolve of the coprocessor class. The coprocessor class is loaded, but it is in the context of the HRegionServer / HRegion. So, the call fails: {noformat} 2011-12-02 00:34:17,348 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: Error in readFields java.io.IOException: Protocol class com.MyCoprocessorClassProtocol not found at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:125) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:575) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:105) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1237) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1167) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:703) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:495) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:470) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: java.lang.ClassNotFoundException: com.MyCoprocessorClassProtocol at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943) at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:122) ... 10 more {noformat} Probably the correct way to fix this is to make Exec really smart, so that it knows all the class definitions loaded in CoprocessorHost(s). I created a small patch that simply doesn't resolve the class definition in the Exec, instead passing it as string down to the HRegion layer. This layer knows all the definitions, and simply loads it by name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Commented] (HBASE-4946) HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to d
[ https://issues.apache.org/jira/browse/HBASE-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163840#comment-13163840 ] Ted Yu commented on HBASE-4946: --- For coprocessor/Exec.java, javadoc doesn't match code: {code} try { protocol = (ClassCoprocessorProtocol)conf.getClassByName(protocolName); } catch (ClassNotFoundException cnfe) { - throw new IOException(Protocol class +protocolName+ not found, cnfe); + // can't do eager instantiation. pass it as a string and try to deserialize later. + //throw new IOException(Protocol class +protocolName+ not found, cnfe); {code} I think the above try block should be commented out. TestCoprocessorEndpoint passes without the above assignment to protocol. HTable.coprocessorExec (and possibly coprocessorProxy) does not work with dynamically loaded coprocessors (from hdfs or local system), because the RPC system tries to deserialize an unknown class. - Key: HBASE-4946 URL: https://issues.apache.org/jira/browse/HBASE-4946 Project: HBase Issue Type: Bug Components: coprocessors Affects Versions: 0.92.0 Reporter: Andrei Dragomir Assignee: Andrei Dragomir Attachments: HBASE-4946-v2.patch, HBASE-4946-v3.patch, HBASE-4946.patch Loading coprocessors jars from hdfs works fine. I load it from the shell, after setting the attribute, and it gets loaded: {noformat} INFO org.apache.hadoop.hbase.regionserver.HRegion: Setting up tabledescriptor config now ... INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Class com.MyCoprocessorClass needs to be loaded from a file - hdfs://localhost:9000/coproc/rt- 0.0.1-SNAPSHOT.jar. INFO org.apache.hadoop.hbase.coprocessor.CoprocessorHost: loadInstance: com.MyCoprocessorClass INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: RegionEnvironment createEnvironment DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Registered protocol handler: region=t1,,1322572939753.6409aee1726d31f5e5671a59fe6e384f. protocol=com.MyCoprocessorClassProtocol INFO org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Load coprocessor com.MyCoprocessorClass from HTD of t1 successfully. {noformat} The problem is that this coprocessors simply extends BaseEndpointCoprocessor, with a dynamic method. When calling this method from the client with HTable.coprocessorExec, I get errors on the HRegionServer, because the call cannot be deserialized from writables. The problem is that Exec tries to do an early resolve of the coprocessor class. The coprocessor class is loaded, but it is in the context of the HRegionServer / HRegion. So, the call fails: {noformat} 2011-12-02 00:34:17,348 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: Error in readFields java.io.IOException: Protocol class com.MyCoprocessorClassProtocol not found at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:125) at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:575) at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:105) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1237) at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1167) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:703) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:495) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:470) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Caused by: java.lang.ClassNotFoundException: com.MyCoprocessorClassProtocol at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:943) at org.apache.hadoop.hbase.client.coprocessor.Exec.readFields(Exec.java:122) ... 10 more {noformat} Probably the correct way to fix this is to make Exec really smart, so that it
[jira] [Commented] (HBASE-4120) isolation and allocation
[ https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163853#comment-13163853 ] Ted Yu commented on HBASE-4120: --- From Andrew Purtell (see 'trip report for Hadoop In China' up on dev@): After 0.92 is out I intend to champion / mentor / co-develop 4120 and the follow on table allocation work and target 0.94 for it. I think the RPC QoS aspect is not too controversial. The allocation/reservation aspects I'd like to aim for a coprocessor or at least master plugin based integration so they won't impact stability for users who don't enable it. Unlike RPC QoS I suspect the changes needed to core can be minimized to coprocessor framework additions. Follow up in new JIRAs soon. isolation and allocation Key: HBASE-4120 URL: https://issues.apache.org/jira/browse/HBASE-4120 Project: HBase Issue Type: New Feature Components: master, regionserver Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0 Reporter: Liu Jia Assignee: Liu Jia Fix For: 0.94.0 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, Design_document_for_HBase_isolation_and_allocation_Revised.pdf, HBase_isolation_and_allocation_user_guide.pdf, Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, TablePriority_v12.patch, TablePriority_v12.patch, TablePriority_v8.patch, TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, TablePrioriy_v9.patch The HBase isolation and allocation tool is designed to help users manage cluster resource among different application and tables. When we have a large scale of HBase cluster with many applications running on it, there will be lots of problems. In Taobao there is a cluster for many departments to test their applications performance, these applications are based on HBase. With one cluster which has 12 servers, there will be only one application running exclusively on this server, and many other applications must wait until the previous test finished. After we add allocation manage function to the cluster, applications can share the cluster and run concurrently. Also if the Test Engineer wants to make sure there is no interference, he/she can move out other tables from this group. In groups we use table priority to allocate resource, when system is busy; we can make sure high-priority tables are not affected lower-priority tables Different groups can have different region server configurations, some groups optimized for reading can have large block cache size, and others optimized for writing can have large memstore size. Tables and region servers can be moved easily between groups; after changing the configuration, a group can be restarted alone instead of restarting the whole cluster. git entry : https://github.com/ICT-Ope/HBase_allocation . We hope our work is helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins
[ https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4847: - Resolution: Fixed Fix Version/s: 0.94.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to TRUNK yesterday. Activate single jvm for small tests on jenkins -- Key: HBASE-4847 URL: https://issues.apache.org/jira/browse/HBASE-4847 Project: HBase Issue Type: Improvement Components: build, test Affects Versions: 0.94.0 Environment: build Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.94.0 Attachments: 4847_all.v10.patch, 4847_all.v10.patch, 4847_all.v10.patch, 4847_all.v11.patch, 4847_all.v11.patch, 4847_all.v12.patch, 4847_all.v4.patch, 4847_all.v5.patch, 4847_all.v6.patch, 4847_all.v6.patch, 4847_all.v7.patch, 4847_all.v7.patch, 4847_all.v7.patch, 4847_all.v7.patch, 4847_all.v8.patch, 4847_all.v8.patch, 4847_all.v9.patch, 4847_pom.patch, 4847_pom.v2.patch, 4847_pom.v2.patch, 4847_pom.v2.patch, 4847_pom.v3.patch This will not revolutionate performances alone. We will win between 1 to 4 minutes. But we win as well: - it's a step for parallelizing the tests - new tests are less expensive as they do not create a new jvm: it's a continuous win - it will allow to push it on dev env while having the same env on dev on build, and 3 minutes are 10% of small + medium tests execution time. I will do a few submit patch to see if it works well before asking for the real commit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4712) Document rules for writing tests
[ https://issues.apache.org/jira/browse/HBASE-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4712: - Attachment: 4712.txt Here is the nkeywal doc hacked into docbook. I also moved stuff around breaking out a 'Tests' subsection under developer chapter and stuck Jesse's integration stuff in here too (with some attempt at lead-in text distingushing the two) Document rules for writing tests Key: HBASE-4712 URL: https://issues.apache.org/jira/browse/HBASE-4712 Project: HBase Issue Type: Task Components: test Affects Versions: 0.92.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 4712.txt We saw that some tests could be improved. Documenting the general rules could help. Proposal: HBase tests are divided in three categories: small, medium and large, with corresponding JUnit categories: SmallTest, MediumTest, LargeTest Small tests are executed in parallel in a shared JVM. They must last less than 15 seconds. They must NOT use a cluster. Medium tests are executed in separate JVM. They must last less than 50 seconds. They can use a cluster. They must not fail occasionally. Small and medium tests must not need more than 30 minutes to run altogether. Small and medium tests should be executed by the developers before submitting a patch. Large tests are everything else. They are typically integration tests, non-regression tests for specific bugs, timeout tests, performance tests. Tests rules hints are: - As most as possible, tests should be written as small tests. - All tests should be written to support parallel execution on the same machine, hence should not use shared resources as fixed ports or fixed file names. - All tests should be written to be as fast as possible. - Tests should not overlog. More than 100 lines/second makes the logs complex to read and use i/o that are hence not available for the other tests. - Tests can be written with HBaseTestingUtility . This class offers helper function to create a temp directory and do the cleanup, or to start a cluster. - Sleeps: - Tests should not do a 'Thread.sleep' without testing an ending condition. This allows understanding what the test is waiting for. Moreover, the test will work whatever the machine performances. - Sleep should be minimal to be as fast as possible. Waiting for a variable should be done in a 40ms sleep loop. Waiting for a socket operation should be done in a 200 ms sleep loop. - Tests using cluster: - Tests using a HRegion do not have to start a cluster: A region can use the local file system. - Start/stopping a cluster cost around 10 seconds. They should not be started per test method but per class. - Started cluster must be shutdown using HBaseTestingUtility#shutdownMiniCluster, which cleans the directories. - As most as possible, tests should use the default settings for the cluster. When they don't, they should document it. This will allow to share the cluster later. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4712) Document rules for writing tests
[ https://issues.apache.org/jira/browse/HBASE-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-4712. -- Resolution: Fixed Fix Version/s: 0.94.0 Hadoop Flags: Reviewed Committed to trunk. Thanks for the doc. N. Document rules for writing tests Key: HBASE-4712 URL: https://issues.apache.org/jira/browse/HBASE-4712 Project: HBase Issue Type: Task Components: test Affects Versions: 0.92.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.94.0 Attachments: 4712.txt We saw that some tests could be improved. Documenting the general rules could help. Proposal: HBase tests are divided in three categories: small, medium and large, with corresponding JUnit categories: SmallTest, MediumTest, LargeTest Small tests are executed in parallel in a shared JVM. They must last less than 15 seconds. They must NOT use a cluster. Medium tests are executed in separate JVM. They must last less than 50 seconds. They can use a cluster. They must not fail occasionally. Small and medium tests must not need more than 30 minutes to run altogether. Small and medium tests should be executed by the developers before submitting a patch. Large tests are everything else. They are typically integration tests, non-regression tests for specific bugs, timeout tests, performance tests. Tests rules hints are: - As most as possible, tests should be written as small tests. - All tests should be written to support parallel execution on the same machine, hence should not use shared resources as fixed ports or fixed file names. - All tests should be written to be as fast as possible. - Tests should not overlog. More than 100 lines/second makes the logs complex to read and use i/o that are hence not available for the other tests. - Tests can be written with HBaseTestingUtility . This class offers helper function to create a temp directory and do the cleanup, or to start a cluster. - Sleeps: - Tests should not do a 'Thread.sleep' without testing an ending condition. This allows understanding what the test is waiting for. Moreover, the test will work whatever the machine performances. - Sleep should be minimal to be as fast as possible. Waiting for a variable should be done in a 40ms sleep loop. Waiting for a socket operation should be done in a 200 ms sleep loop. - Tests using cluster: - Tests using a HRegion do not have to start a cluster: A region can use the local file system. - Start/stopping a cluster cost around 10 seconds. They should not be started per test method but per class. - Started cluster must be shutdown using HBaseTestingUtility#shutdownMiniCluster, which cleans the directories. - As most as possible, tests should use the default settings for the cluster. When they don't, they should document it. This will allow to share the cluster later. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4729) Clash between region unassign and splitting kills the master
[ https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163896#comment-13163896 ] stack commented on HBASE-4729: -- Actually commit the patches (Above I say I do but just noticed that I had not -- my schizophrenia is showing through again... pardon me). Clash between region unassign and splitting kills the master Key: HBASE-4729 URL: https://issues.apache.org/jira/browse/HBASE-4729 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: stack Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: 4729-v2.txt, 4729-v3.txt, 4729-v4.txt, 4729-v5.txt, 4729-v6-092.txt, 4729-v6-trunk.txt, 4729.txt I was running an online alter while regions were splitting, and suddenly the master died and left my table half-altered (haven't restarted the master yet). What killed the master: {quote} 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception creating node CLOSING org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101 at org.apache.zookeeper.KeeperException.create(KeeperException.java:110) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769) at org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661) at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} A znode was created because the region server was splitting the region 4 seconds before: {quote} 2011-11-02 17:06:40,704 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101. 2011-11-02 17:06:40,704 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:62023-0x132f043bbde0710 Creating ephemeral node for f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Attempting to transition node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING ... 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Successfully transitioned node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLIT 2011-11-02 17:06:44,061 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for f7e1783e65ea8d621a4bc96ad310f101 {quote} Now that the master is dead the region server is spewing those last two lines like mad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4120) isolation and allocation
[ https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163902#comment-13163902 ] Ted Yu commented on HBASE-4120: --- Andy suggested placing the PriorityFunction.initRegionPriority(region) call in RegionObserver.postOpen() isolation and allocation Key: HBASE-4120 URL: https://issues.apache.org/jira/browse/HBASE-4120 Project: HBase Issue Type: New Feature Components: master, regionserver Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0 Reporter: Liu Jia Assignee: Liu Jia Fix For: 0.94.0 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, Design_document_for_HBase_isolation_and_allocation_Revised.pdf, HBase_isolation_and_allocation_user_guide.pdf, Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, TablePriority_v12.patch, TablePriority_v12.patch, TablePriority_v8.patch, TablePriority_v8.patch, TablePriority_v8_for_trunk.patch, TablePrioriy_v9.patch The HBase isolation and allocation tool is designed to help users manage cluster resource among different application and tables. When we have a large scale of HBase cluster with many applications running on it, there will be lots of problems. In Taobao there is a cluster for many departments to test their applications performance, these applications are based on HBase. With one cluster which has 12 servers, there will be only one application running exclusively on this server, and many other applications must wait until the previous test finished. After we add allocation manage function to the cluster, applications can share the cluster and run concurrently. Also if the Test Engineer wants to make sure there is no interference, he/she can move out other tables from this group. In groups we use table priority to allocate resource, when system is busy; we can make sure high-priority tables are not affected lower-priority tables Different groups can have different region server configurations, some groups optimized for reading can have large block cache size, and others optimized for writing can have large memstore size. Tables and region servers can be moved easily between groups; after changing the configuration, a group can be restarted alone instead of restarting the whole cluster. git entry : https://github.com/ICT-Ope/HBase_allocation . We hope our work is helpful. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4927) CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty
[ https://issues.apache.org/jira/browse/HBASE-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4927: - Resolution: Fixed Fix Version/s: (was: 0.94.0) 0.92.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Applied trunk and 0.92 branch. Thanks for the patch Jimmy. CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty --- Key: HBASE-4927 URL: https://issues.apache.org/jira/browse/HBASE-4927 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.92.0 Attachments: 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator-.patch, 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator_v2.patch, hbase-4927-fix-ws.txt When reviewing HBASE-4238 backporting, Jon found this issue. What happens if the split points are (empty end key is the last key, empty start key is the first key) Parent [A,) L daughter [A,B), R daughter [B,) When sorted, we gets to end key comparision which results in this incorrector order: [A,B), [A,), [B,) we wanted: [A,), [A,B), [B,) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4965) Monitor the open file descriptors and the threads counters during the unit tests
[ https://issues.apache.org/jira/browse/HBASE-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163916#comment-13163916 ] stack commented on HBASE-4965: -- Where is the code N? Should we run this in all tests for a while till we nail some of the file descriptor issues up in hadoopqa patch build? Monitor the open file descriptors and the threads counters during the unit tests Key: HBASE-4965 URL: https://issues.apache.org/jira/browse/HBASE-4965 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor We're seeing a lot of issues with hadoop-qa related to threads or file descriptors. Monitoring these counters would ease the analysis. Note as well that - if we want to execute the tests in the same jvm (because the test is small or because we want to share the cluster) we can't afford to leak too many resources - if the tests leak, it's more difficult to detect a leak in the software itself. I attach piece of code that I used. It requires two lines in a unit test class to: - before every test, count the threads and the open file descriptor - after every test, compare with the previous value. I ran it on some tests; we have for example: - client.TestMultiParallel#testBatchWithManyColsInOneRowGetAndPut: 232 threads (was 231), 390 file descriptors (was 390). = TestMultiParallel uses 232 threads! - client.TestMultipleTimestamps#testWithColumnDeletes: 152 threads (was 151), 283 file descriptors (was 282). - client.TestAdmin#testCheckHBaseAvailableClosesConnection: 477 threads (was 294), 815 file descriptors (was 461) - client.TestMetaMigrationRemovingHTD#testMetaMigration: 149 threads (was 148), 310 file descriptors (was 307). It's not always leaks, we can expect some pooling effects. But still... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4605) Constraints
[ https://issues.apache.org/jira/browse/HBASE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163923#comment-13163923 ] jirapos...@reviews.apache.org commented on HBASE-4605: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2579/#review3653 --- Answering most of the big issues on HBASE-4605 since there is a clean summary there by Ted. src/main/java/org/apache/hadoop/hbase/constraint/BaseConstraint.java https://reviews.apache.org/r/2579/#comment8150 Done src/main/java/org/apache/hadoop/hbase/constraint/Constraint.java https://reviews.apache.org/r/2579/#comment8151 Done src/main/java/org/apache/hadoop/hbase/constraint/Constraint.java https://reviews.apache.org/r/2579/#comment8153 this could definitely be an extension. +1 on future considerations. src/main/java/org/apache/hadoop/hbase/constraint/ConstraintException.java https://reviews.apache.org/r/2579/#comment8152 Done src/main/java/org/apache/hadoop/hbase/constraint/ConstraintProcessor.java https://reviews.apache.org/r/2579/#comment8154 done src/main/java/org/apache/hadoop/hbase/constraint/Constraints.java https://reviews.apache.org/r/2579/#comment8147 I think we pulled getValueAsBytes() from HTD since this was the only use case. Agreed that it is a bit awkward, but we figured that since this is really the only use case for it, it didn't make sense to keep that function around. Adding this back in makes it easier to remove guava, since that ser/deser can be kept entirely under the covers. src/main/java/org/apache/hadoop/hbase/constraint/Constraints.java https://reviews.apache.org/r/2579/#comment8148 +1 on using the configuration. Originally, this was done to ensure that the configuration could be completely open to the user, but constraining (no pun intended) the conf by a couple values is not a big deal. However, again we are at the question of keeping it human readable or not. See top comment. src/main/java/org/apache/hadoop/hbase/constraint/IntegerConstraint.java https://reviews.apache.org/r/2579/#comment8149 Moved to documentation. - Jesse On 2011-11-29 20:19:41, Jesse Yates wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2579/ bq. --- bq. bq. (Updated 2011-11-29 20:19:41) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. --- bq. bq. Most of the implementation for adding constraints as a coprocessor. bq. bq. Looking for general comments on style/structure, though nitpicks are ok too. bq. bq. Currently missing implementation for disableConstraints() since that will require adding removeCoprocessor() to HTD (also comments on if this is worth it would be good). bq. bq. bq. This addresses bug HBASE-4605. bq. https://issues.apache.org/jira/browse/HBASE-4605 bq. bq. bq. Diffs bq. - bq. bq.src/docbkx/book.xml 3c12169 bq.src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java 84a0d1a bq.src/main/java/org/apache/hadoop/hbase/constraint/BaseConstraint.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/constraint/Constraint.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/constraint/ConstraintException.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/constraint/ConstraintProcessor.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/constraint/Constraints.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/constraint/IntegerConstraint.java PRE-CREATION bq.src/main/java/org/apache/hadoop/hbase/constraint/package-info.java PRE-CREATION bq.src/test/java/org/apache/hadoop/hbase/TestHTableDescriptor.java PRE-CREATION bq.src/test/java/org/apache/hadoop/hbase/constraint/AllFailConstraint.java PRE-CREATION bq.src/test/java/org/apache/hadoop/hbase/constraint/AllPassConstraint.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/constraint/CheckConfigurationConstraint.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/constraint/IntegrationTestConstraint.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/constraint/RuntimeFailConstraint.java PRE-CREATION bq.src/test/java/org/apache/hadoop/hbase/constraint/TestConstraints.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/constraint/TestIntegerConstraint.java PRE-CREATION bq.src/test/java/org/apache/hadoop/hbase/constraint/WorksConstraint.java PRE-CREATION bq. bq. Diff: https://reviews.apache.org/r/2579/diff bq. bq.
[jira] [Commented] (HBASE-4605) Constraints
[ https://issues.apache.org/jira/browse/HBASE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163926#comment-13163926 ] Jesse Yates commented on HBASE-4605: @Gary: Guava- I didn't know about the effort to get rid of guava client side, it can be pulled if it is really a big issue. I just read through 3264 - is it the case that only the client side dependencies are used for the MR stuff? For this it works out really nicely for doing the auto-serializing (see changes to the CP stuff). We could say that the set bytes for key is at your own risk, and make strings the default. getValueAsBytes() - this was -1'ed in a previous iteration as constraints was the only use case for it - wrapping it with the explicit functions seemed a decent solution. Configuration - As far as human readable, how often are people going to need to check those values? The HTD already just keeps around bytes, so people don't check those over the wire - constraints would remain in the same style. How would you feel about making that configurable? I'm thinking setting a debug flag in Constraints/configuration for that value. check(Put) - this could definitely be an extension. +1 on future considerations. IntegerConstraint - In the end, I can let this go. You're right that having in the docs should be enough. Also, combined with the fact that we don't have any others by default (and that removing is far harder than adding) I'll drop it into just the docs @Ted: IntegrationTestConstraint - based on the recent discussion on dev@ about testing, this should be renamed to just TestConstraint and labeled as @MediumTest. The rest should be moved to @SmallTest Atomicity - since per-row modifications are done essentially serially on the region, is this really that big of an issue? If not, then another jira isn't a necessary, but maybe something to consider going forward. Taking the row lock is going to be a big slowdown - it just depends on what the common use case is going to be. (this is reproducing a some of what was commented on RB, but is nice to have succinctly). Constraints --- Key: HBASE-4605 URL: https://issues.apache.org/jira/browse/HBASE-4605 Project: HBase Issue Type: Improvement Components: client, coprocessors Affects Versions: 0.94.0 Reporter: Jesse Yates Assignee: Jesse Yates Attachments: 4605.v7, constraint_as_cp.txt, java_Constraint_v2.patch, java_HBASE-4605_v1.patch, java_HBASE-4605_v2.patch, java_HBASE-4605_v3.patch From Jesse's comment on dev: {quote} What I would like to propose is a simple interface that people can use to implement a 'constraint' (matching the classic database definition). This would help ease of adoption by helping HBase more easily check that box, help minimize code duplication across organizations, and lead to easier adoption. Essentially, people would implement a 'Constraint' interface for checking keys before they are put into a table. Puts that are valid get written to the table, but if not people can will throw an exception that gets propagated back to the client explaining why the put was invalid. Constraints would be set on a per-table basis and the user would be expected to ensure the jars containing the constraint are present on the machines serving that table. Yes, people could roll their own mechanism for doing this via coprocessors each time, but this would make it easier to do so, so you only have to implement a very minimal interface and not worry about the specifics. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163925#comment-13163925 ] stack commented on HBASE-4880: -- Excellent detective work here lads. I too would lean toward Rams' original suggestion, that we only online the region after zk update and edit to .meta. has gone in. Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163928#comment-13163928 ] stack commented on HBASE-4880: -- I took a look at the patch. Yeah, its a bit messy but nice first cut at the problem. Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4965) Monitor the open file descriptors and the threads counters during the unit tests
[ https://issues.apache.org/jira/browse/HBASE-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-4965: --- Attachment: ResourceCheckerJUnitRule.java ResourceChecker.java Monitor the open file descriptors and the threads counters during the unit tests Key: HBASE-4965 URL: https://issues.apache.org/jira/browse/HBASE-4965 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: ResourceChecker.java, ResourceCheckerJUnitRule.java We're seeing a lot of issues with hadoop-qa related to threads or file descriptors. Monitoring these counters would ease the analysis. Note as well that - if we want to execute the tests in the same jvm (because the test is small or because we want to share the cluster) we can't afford to leak too many resources - if the tests leak, it's more difficult to detect a leak in the software itself. I attach piece of code that I used. It requires two lines in a unit test class to: - before every test, count the threads and the open file descriptor - after every test, compare with the previous value. I ran it on some tests; we have for example: - client.TestMultiParallel#testBatchWithManyColsInOneRowGetAndPut: 232 threads (was 231), 390 file descriptors (was 390). = TestMultiParallel uses 232 threads! - client.TestMultipleTimestamps#testWithColumnDeletes: 152 threads (was 151), 283 file descriptors (was 282). - client.TestAdmin#testCheckHBaseAvailableClosesConnection: 477 threads (was 294), 815 file descriptors (was 461) - client.TestMetaMigrationRemovingHTD#testMetaMigration: 149 threads (was 148), 310 file descriptors (was 307). It's not always leaks, we can expect some pooling effects. But still... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4965) Monitor the open file descriptors and the threads counters during the unit tests
[ https://issues.apache.org/jira/browse/HBASE-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163937#comment-13163937 ] nkeywal commented on HBASE-4965: Oops. Sorry. Here they are. To use then, we need to add these following lines in the test code {noformat} @Rule public ResourceCheckerJUnitRule cu = new ResourceCheckerJUnitRule(); {noformat} It's the less intrusive way I found. Before and after each code method, we count the number of threads and number of open file handles, and log them. Unfortunately, I found a bug in surefire and these lines are not stored with redirect to file option activated. I fixed it locally. Despite this, I think it makes sense to track these data. It will ease the analysis when something goes wrong, even if fixing all the current leaks would take quite a lot of time. Monitor the open file descriptors and the threads counters during the unit tests Key: HBASE-4965 URL: https://issues.apache.org/jira/browse/HBASE-4965 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: ResourceChecker.java, ResourceCheckerJUnitRule.java We're seeing a lot of issues with hadoop-qa related to threads or file descriptors. Monitoring these counters would ease the analysis. Note as well that - if we want to execute the tests in the same jvm (because the test is small or because we want to share the cluster) we can't afford to leak too many resources - if the tests leak, it's more difficult to detect a leak in the software itself. I attach piece of code that I used. It requires two lines in a unit test class to: - before every test, count the threads and the open file descriptor - after every test, compare with the previous value. I ran it on some tests; we have for example: - client.TestMultiParallel#testBatchWithManyColsInOneRowGetAndPut: 232 threads (was 231), 390 file descriptors (was 390). = TestMultiParallel uses 232 threads! - client.TestMultipleTimestamps#testWithColumnDeletes: 152 threads (was 151), 283 file descriptors (was 282). - client.TestAdmin#testCheckHBaseAvailableClosesConnection: 477 threads (was 294), 815 file descriptors (was 461) - client.TestMetaMigrationRemovingHTD#testMetaMigration: 149 threads (was 148), 310 file descriptors (was 307). It's not always leaks, we can expect some pooling effects. But still... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4605) Constraints
[ https://issues.apache.org/jira/browse/HBASE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163939#comment-13163939 ] Ted Yu commented on HBASE-4605: --- Renaming IntegrationTestConstraint as described above makes sense. Thanks Jesse. Constraints --- Key: HBASE-4605 URL: https://issues.apache.org/jira/browse/HBASE-4605 Project: HBase Issue Type: Improvement Components: client, coprocessors Affects Versions: 0.94.0 Reporter: Jesse Yates Assignee: Jesse Yates Attachments: 4605.v7, constraint_as_cp.txt, java_Constraint_v2.patch, java_HBASE-4605_v1.patch, java_HBASE-4605_v2.patch, java_HBASE-4605_v3.patch From Jesse's comment on dev: {quote} What I would like to propose is a simple interface that people can use to implement a 'constraint' (matching the classic database definition). This would help ease of adoption by helping HBase more easily check that box, help minimize code duplication across organizations, and lead to easier adoption. Essentially, people would implement a 'Constraint' interface for checking keys before they are put into a table. Puts that are valid get written to the table, but if not people can will throw an exception that gets propagated back to the client explaining why the put was invalid. Constraints would be set on a per-table basis and the user would be expected to ensure the jars containing the constraint are present on the machines serving that table. Yes, people could roll their own mechanism for doing this via coprocessors each time, but this would make it easier to do so, so you only have to implement a very minimal interface and not worry about the specifics. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163943#comment-13163943 ] Zhihong Yu commented on HBASE-4880: --- TestAdmin#testForceSplit would fail if I comment out the following line in HRegionServer.postOpenDeployTasks(): {code} -addToOnlineRegions(r); +// addToOnlineRegions(r); {code} This is the error: {code} org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: NotServingRegionException: 1 time, servers with issues: lm-sjn-xx:65442, at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1641) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892) at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725) at org.apache.hadoop.hbase.client.TestAdmin.splitTest(TestAdmin.java:805) at org.apache.hadoop.hbase.client.TestAdmin.testForceSplit(TestAdmin.java:108) {code} Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4605) Constraints
[ https://issues.apache.org/jira/browse/HBASE-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163944#comment-13163944 ] Zhihong Yu commented on HBASE-4605: --- w.r.t. atomicity, it is outside the initial scope of this JIRA. Constraints --- Key: HBASE-4605 URL: https://issues.apache.org/jira/browse/HBASE-4605 Project: HBase Issue Type: Improvement Components: client, coprocessors Affects Versions: 0.94.0 Reporter: Jesse Yates Assignee: Jesse Yates Attachments: 4605.v7, constraint_as_cp.txt, java_Constraint_v2.patch, java_HBASE-4605_v1.patch, java_HBASE-4605_v2.patch, java_HBASE-4605_v3.patch From Jesse's comment on dev: {quote} What I would like to propose is a simple interface that people can use to implement a 'constraint' (matching the classic database definition). This would help ease of adoption by helping HBase more easily check that box, help minimize code duplication across organizations, and lead to easier adoption. Essentially, people would implement a 'Constraint' interface for checking keys before they are put into a table. Puts that are valid get written to the table, but if not people can will throw an exception that gets propagated back to the client explaining why the put was invalid. Constraints would be set on a per-table basis and the user would be expected to ensure the jars containing the constraint are present on the machines serving that table. Yes, people could roll their own mechanism for doing this via coprocessors each time, but this would make it easier to do so, so you only have to implement a very minimal interface and not worry about the specifics. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163946#comment-13163946 ] Phabricator commented on HBASE-4908: stack has commented on the revision [jira] [HBASE-4908] HBase cluster test tool (port from 0.89-fb). This is looking great Mikhail. INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/EmptyWatcher.java:28 When would I want one of these? src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java:176 Is this from another issue Mikhail? (No matter if it is... we can take care of it on commit). src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java:37 Thanks for doing this. src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java:111 Thanks for fixing this. src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java:1798 Good src/test/java/org/apache/hadoop/hbase/util/IntegrationTestTool.java:39 Will this be run as an IntegrationTest because it has the IntegrationTest prefix; (see the Integration Test section on this page, http://hbase.apache.org/book/hbase.tests.html, if you don't have cluse what I'm on about)? I like the idea of this class. We need it. Nice how you subclass tool. src/test/java/org/apache/hadoop/hbase/util/LoadTest.java:40 Should this class have the IntegrationTest prefix or you think this is just a tool not part of IntegrationTests? Since its sitting beside PerformanceEvaluation tool, should you say something on how it differs from it? src/test/java/org/apache/hadoop/hbase/util/RestartMetaTest.java:37 Should this be an IntegrationTest? (We can do the convertion in another issue). REVISION DETAIL https://reviews.facebook.net/D549 HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4965) Monitor the open file descriptors and the threads counters during the unit tests
[ https://issues.apache.org/jira/browse/HBASE-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163947#comment-13163947 ] stack commented on HBASE-4965: -- So, you want to add it for running of all current tests? Sounds good to me (Classes are missing license and class comments explaining what they are at). We have to add the @Rule to every test method or just to each Test class? Monitor the open file descriptors and the threads counters during the unit tests Key: HBASE-4965 URL: https://issues.apache.org/jira/browse/HBASE-4965 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: ResourceChecker.java, ResourceCheckerJUnitRule.java We're seeing a lot of issues with hadoop-qa related to threads or file descriptors. Monitoring these counters would ease the analysis. Note as well that - if we want to execute the tests in the same jvm (because the test is small or because we want to share the cluster) we can't afford to leak too many resources - if the tests leak, it's more difficult to detect a leak in the software itself. I attach piece of code that I used. It requires two lines in a unit test class to: - before every test, count the threads and the open file descriptor - after every test, compare with the previous value. I ran it on some tests; we have for example: - client.TestMultiParallel#testBatchWithManyColsInOneRowGetAndPut: 232 threads (was 231), 390 file descriptors (was 390). = TestMultiParallel uses 232 threads! - client.TestMultipleTimestamps#testWithColumnDeletes: 152 threads (was 151), 283 file descriptors (was 282). - client.TestAdmin#testCheckHBaseAvailableClosesConnection: 477 threads (was 294), 815 file descriptors (was 461) - client.TestMetaMigrationRemovingHTD#testMetaMigration: 149 threads (was 148), 310 file descriptors (was 307). It's not always leaks, we can expect some pooling effects. But still... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4965) Monitor the open file descriptors and the threads counters during the unit tests
[ https://issues.apache.org/jira/browse/HBASE-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163951#comment-13163951 ] stack commented on HBASE-4965: -- Where does the output show? In the test output? Good stuff. Monitor the open file descriptors and the threads counters during the unit tests Key: HBASE-4965 URL: https://issues.apache.org/jira/browse/HBASE-4965 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: ResourceChecker.java, ResourceCheckerJUnitRule.java We're seeing a lot of issues with hadoop-qa related to threads or file descriptors. Monitoring these counters would ease the analysis. Note as well that - if we want to execute the tests in the same jvm (because the test is small or because we want to share the cluster) we can't afford to leak too many resources - if the tests leak, it's more difficult to detect a leak in the software itself. I attach piece of code that I used. It requires two lines in a unit test class to: - before every test, count the threads and the open file descriptor - after every test, compare with the previous value. I ran it on some tests; we have for example: - client.TestMultiParallel#testBatchWithManyColsInOneRowGetAndPut: 232 threads (was 231), 390 file descriptors (was 390). = TestMultiParallel uses 232 threads! - client.TestMultipleTimestamps#testWithColumnDeletes: 152 threads (was 151), 283 file descriptors (was 282). - client.TestAdmin#testCheckHBaseAvailableClosesConnection: 477 threads (was 294), 815 file descriptors (was 461) - client.TestMetaMigrationRemovingHTD#testMetaMigration: 149 threads (was 148), 310 file descriptors (was 307). It's not always leaks, we can expect some pooling effects. But still... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163953#comment-13163953 ] stack commented on HBASE-4880: -- @Zhihong We need to fix the above split issue if we refactor so we only put region online as last step? Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4960) Document mutual authentication between HBase and Zookeeper using SASL
[ https://issues.apache.org/jira/browse/HBASE-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163961#comment-13163961 ] stack commented on HBASE-4960: -- Thats some nice doc you made there Eugene.. Committing. Document mutual authentication between HBase and Zookeeper using SASL - Key: HBASE-4960 URL: https://issues.apache.org/jira/browse/HBASE-4960 Project: HBase Issue Type: Sub-task Components: documentation, security Reporter: Eugene Koontz Assignee: Eugene Koontz Labels: documentation, security Attachments: HBASE-4960.patch, HBASE-4960.patch Provide documentation for the work done in HBASE-2418 (add support for ZooKeeper authentication). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4960) Document mutual authentication between HBase and Zookeeper using SASL
[ https://issues.apache.org/jira/browse/HBASE-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4960: - Resolution: Fixed Fix Version/s: 0.92.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to TRUNK though marked as 0.92.0. Will copy the book over to branch just before cut next 0.92.0RC. Thanks Eugene. Nice doc. Document mutual authentication between HBase and Zookeeper using SASL - Key: HBASE-4960 URL: https://issues.apache.org/jira/browse/HBASE-4960 Project: HBase Issue Type: Sub-task Components: documentation, security Reporter: Eugene Koontz Assignee: Eugene Koontz Labels: documentation, security Fix For: 0.92.0 Attachments: HBASE-4960.patch, HBASE-4960.patch Provide documentation for the work done in HBASE-2418 (add support for ZooKeeper authentication). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4965) Monitor the open file descriptors and the threads counters during the unit tests
[ https://issues.apache.org/jira/browse/HBASE-4965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163966#comment-13163966 ] nkeywal commented on HBASE-4965: It's one rule for each test class. With a fixed surefire, it shows as a standard log in the output. For example: {noformat} 2011-12-06 15:03:32,982 INFO [main] hbase.HBaseTestingUtility(518): Starting up minicluster with 1 master(s) and 3 regionserver(s) and 3 datanode(s) 2011-12-06 15:03:34,052 WARN [main] impl.MetricsSystemImpl(137): Metrics system not started: Cannot locate configuration: tried hadoop-metrics2-namenode.properties, hadoop-metrics2.properties [...] 2011-12-06 15:03:41,587 DEBUG [main] client.HTable$ClientScanner(1183): Finished with scanning at {NAME = '.META.,,1', STARTKEY = '', ENDKEY = '', ENCODED = 1028785192,} 2011-12-06 15:03:41,588 INFO [main] hbase.HBaseTestingUtility(561): Minicluster is up 2011-12-06 15:03:41,588 INFO [main] client.HConnectionManager$HConnectionImplementation(1805): Closed zookeeper sessionid=0x134159e31930008 2011-12-06 15:03:41,661 INFO [main] hbase.ResourceChecker(117): before org.apache.hadoop.hbase.client.TestAdmin#testDeleteEditUnknownColumnFamilyAndOrTable: 247 threads, 417 file descriptors [...] 2011-12-06 15:03:43,282 INFO [main] client.HConnectionManager$HConnectionImplementation(1805): Closed zookeeper sessionid=0x134159e31930009 2011-12-06 15:03:43,313 INFO [main] hbase.ResourceChecker(117): after org.apache.hadoop.hbase.client.TestAdmin#testDeleteEditUnknownColumnFamilyAndOrTable: 265 threads (was 247), 450 file descriptors (was 417). -thread leak?- -file handle leak?- [...] {noformat} If you're ok with the idea, I will professionalize the code a little and propose it as a patch. Monitor the open file descriptors and the threads counters during the unit tests Key: HBASE-4965 URL: https://issues.apache.org/jira/browse/HBASE-4965 Project: HBase Issue Type: Improvement Components: test Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: ResourceChecker.java, ResourceCheckerJUnitRule.java We're seeing a lot of issues with hadoop-qa related to threads or file descriptors. Monitoring these counters would ease the analysis. Note as well that - if we want to execute the tests in the same jvm (because the test is small or because we want to share the cluster) we can't afford to leak too many resources - if the tests leak, it's more difficult to detect a leak in the software itself. I attach piece of code that I used. It requires two lines in a unit test class to: - before every test, count the threads and the open file descriptor - after every test, compare with the previous value. I ran it on some tests; we have for example: - client.TestMultiParallel#testBatchWithManyColsInOneRowGetAndPut: 232 threads (was 231), 390 file descriptors (was 390). = TestMultiParallel uses 232 threads! - client.TestMultipleTimestamps#testWithColumnDeletes: 152 threads (was 151), 283 file descriptors (was 282). - client.TestAdmin#testCheckHBaseAvailableClosesConnection: 477 threads (was 294), 815 file descriptors (was 461) - client.TestMetaMigrationRemovingHTD#testMetaMigration: 149 threads (was 148), 310 file descriptors (was 307). It's not always leaks, we can expect some pooling effects. But still... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4937) Error in Quick Start Shell Exercises
[ https://issues.apache.org/jira/browse/HBASE-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4937: - Attachment: 4937.txt Patch to fix Ryan's observation. Error in Quick Start Shell Exercises Key: HBASE-4937 URL: https://issues.apache.org/jira/browse/HBASE-4937 Project: HBase Issue Type: Bug Components: documentation Reporter: Ryan Berdeen Attachments: 4937.txt The shell exercises in the Quick Start (http://hbase.apache.org/book/quickstart.html) starts {code} hbase(main):003:0 create 'test', 'cf' 0 row(s) in 1.2200 seconds hbase(main):003:0 list 'table' test 1 row(s) in 0.0550 seconds {code} It looks like the second command is wrong. Running it, the actual output is {code} hbase(main):001:0 create 'test', 'cf' 0 row(s) in 0.3630 seconds hbase(main):002:0 list 'table' TABLE 0 row(s) in 0.0100 seconds {code} The argument to list should be 'test', not 'table', and the output in the example is missing the {{TABLE}} line. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4937) Error in Quick Start Shell Exercises
[ https://issues.apache.org/jira/browse/HBASE-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-4937. -- Resolution: Fixed Fix Version/s: 0.94.0 Assignee: stack Committed to trunk. Thanks for the doc bug report Ryan. Error in Quick Start Shell Exercises Key: HBASE-4937 URL: https://issues.apache.org/jira/browse/HBASE-4937 Project: HBase Issue Type: Bug Components: documentation Reporter: Ryan Berdeen Assignee: stack Fix For: 0.94.0 Attachments: 4937.txt The shell exercises in the Quick Start (http://hbase.apache.org/book/quickstart.html) starts {code} hbase(main):003:0 create 'test', 'cf' 0 row(s) in 1.2200 seconds hbase(main):003:0 list 'table' test 1 row(s) in 0.0550 seconds {code} It looks like the second command is wrong. Running it, the actual output is {code} hbase(main):001:0 create 'test', 'cf' 0 row(s) in 0.3630 seconds hbase(main):002:0 list 'table' TABLE 0 row(s) in 0.0100 seconds {code} The argument to list should be 'test', not 'table', and the output in the example is missing the {{TABLE}} line. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4936) Cached HRegionInterface connections crash when getting UnknownHost exceptions
[ https://issues.apache.org/jira/browse/HBASE-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4936: - Resolution: Fixed Fix Version/s: 0.94.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to TRUNK. Thanks Andrei. Cached HRegionInterface connections crash when getting UnknownHost exceptions - Key: HBASE-4936 URL: https://issues.apache.org/jira/browse/HBASE-4936 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Andrei Dragomir Assignee: Andrei Dragomir Fix For: 0.94.0 Attachments: HBASE-4936-v2.patch, HBASE-4936.patch This isssue is unlikely to come up in a cluster test case. However, for development, the following thing happens: 1. Start the HBase cluster locally, on network A (DNS A, etc) 2. The region locations are cached using the hostname (mycomputer.company.com, 211.x.y.z - real ip) 3. Change network location (go home) 4. Start the HBase cluster locally. My hostname / ips are not different (mycomputer, 192.168.0.130 - new ip) If the region locations have been cached using the hostname, there is an UnknownHostException in CatalogTracker.getCachedConnection(ServerName sn), uncaught in the catch statements. The server will crash constantly. The error should be caught and not rethrown, so that the cached connection expires normally. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4951) master process can not be stopped when it is initializing
[ https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163980#comment-13163980 ] stack commented on HBASE-4951: -- I believe this fixed in 0.92. Won't close till prove it. master process can not be stopped when it is initializing - Key: HBASE-4951 URL: https://issues.apache.org/jira/browse/HBASE-4951 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: xufeng Priority: Critical Fix For: 0.92.0, 0.90.5 It is easy to reproduce by following step: step1:start master process.(do not start regionserver process in the cluster). the master will wait the regionserver to check in: org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to checkin step2:stop the master by sh command bin/hbase master stop result:the master process will never die because catalogTracker.waitForRoot() method will block unitl the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4954) IllegalArgumentException in hfile2 blockseek
[ https://issues.apache.org/jira/browse/HBASE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman Shaposhnik resolved HBASE-4954. - Resolution: Won't Fix IllegalArgumentException in hfile2 blockseek Key: HBASE-4954 URL: https://issues.apache.org/jira/browse/HBASE-4954 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Fix For: 0.92.0 On Tue, Nov 29, 2011 at 10:20 PM, Stack st...@duboce.net wrote: The first hbase 0.92.0 release candidate is available for download: http://people.apache.org/~stack/hbase-0.92.0-candidate-0/ Here's another persistent issues that I'd appreciate somebody taking a quick look at: http://bigtop01.cloudera.org:8080/view/Hadoop%200.22/job/Bigtop-hadoop22-smoketest/28/testReport/org.apache.bigtop.itest.hbase.smoke/TestHFileOutputFormat/testMRIncrementalLoadWithSplit/ Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:218) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.blockSeek(HFileReaderV2.java:632) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:545) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:503) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:511) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:475) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:157) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.copyHFileHalf(LoadIncrementalHFiles.java:544) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.splitStoreFile(LoadIncrementalHFiles.java:516) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.splitStoreFile(LoadIncrementalHFiles.java:377) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:441) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:325) at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:323) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4712) Document rules for writing tests
[ https://issues.apache.org/jira/browse/HBASE-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164013#comment-13164013 ] Hudson commented on HBASE-4712: --- Integrated in HBase-TRUNK #2522 (See [https://builds.apache.org/job/HBase-TRUNK/2522/]) HBASE-4712 Document rules for writing tests stack : Files : * /hbase/trunk/src/docbkx/developer.xml Document rules for writing tests Key: HBASE-4712 URL: https://issues.apache.org/jira/browse/HBASE-4712 Project: HBase Issue Type: Task Components: test Affects Versions: 0.92.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.94.0 Attachments: 4712.txt We saw that some tests could be improved. Documenting the general rules could help. Proposal: HBase tests are divided in three categories: small, medium and large, with corresponding JUnit categories: SmallTest, MediumTest, LargeTest Small tests are executed in parallel in a shared JVM. They must last less than 15 seconds. They must NOT use a cluster. Medium tests are executed in separate JVM. They must last less than 50 seconds. They can use a cluster. They must not fail occasionally. Small and medium tests must not need more than 30 minutes to run altogether. Small and medium tests should be executed by the developers before submitting a patch. Large tests are everything else. They are typically integration tests, non-regression tests for specific bugs, timeout tests, performance tests. Tests rules hints are: - As most as possible, tests should be written as small tests. - All tests should be written to support parallel execution on the same machine, hence should not use shared resources as fixed ports or fixed file names. - All tests should be written to be as fast as possible. - Tests should not overlog. More than 100 lines/second makes the logs complex to read and use i/o that are hence not available for the other tests. - Tests can be written with HBaseTestingUtility . This class offers helper function to create a temp directory and do the cleanup, or to start a cluster. - Sleeps: - Tests should not do a 'Thread.sleep' without testing an ending condition. This allows understanding what the test is waiting for. Moreover, the test will work whatever the machine performances. - Sleep should be minimal to be as fast as possible. Waiting for a variable should be done in a 40ms sleep loop. Waiting for a socket operation should be done in a 200 ms sleep loop. - Tests using cluster: - Tests using a HRegion do not have to start a cluster: A region can use the local file system. - Start/stopping a cluster cost around 10 seconds. They should not be started per test method but per class. - Started cluster must be shutdown using HBaseTestingUtility#shutdownMiniCluster, which cleans the directories. - As most as possible, tests should use the default settings for the cluster. When they don't, they should document it. This will allow to share the cluster later. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4927) CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty
[ https://issues.apache.org/jira/browse/HBASE-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164015#comment-13164015 ] Hudson commented on HBASE-4927: --- Integrated in HBase-TRUNK #2522 (See [https://builds.apache.org/job/HBase-TRUNK/2522/]) HBASE-4927 CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty --- Key: HBASE-4927 URL: https://issues.apache.org/jira/browse/HBASE-4927 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.92.0 Attachments: 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator-.patch, 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator_v2.patch, hbase-4927-fix-ws.txt When reviewing HBASE-4238 backporting, Jon found this issue. What happens if the split points are (empty end key is the last key, empty start key is the first key) Parent [A,) L daughter [A,B), R daughter [B,) When sorted, we gets to end key comparision which results in this incorrector order: [A,B), [A,), [B,) we wanted: [A,), [A,B), [B,) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4729) Clash between region unassign and splitting kills the master
[ https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164014#comment-13164014 ] Hudson commented on HBASE-4729: --- Integrated in HBase-TRUNK #2522 (See [https://builds.apache.org/job/HBase-TRUNK/2522/]) HBASE-4729 Clash between region unassign and splitting kills the master stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java Clash between region unassign and splitting kills the master Key: HBASE-4729 URL: https://issues.apache.org/jira/browse/HBASE-4729 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: stack Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: 4729-v2.txt, 4729-v3.txt, 4729-v4.txt, 4729-v5.txt, 4729-v6-092.txt, 4729-v6-trunk.txt, 4729.txt I was running an online alter while regions were splitting, and suddenly the master died and left my table half-altered (haven't restarted the master yet). What killed the master: {quote} 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception creating node CLOSING org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101 at org.apache.zookeeper.KeeperException.create(KeeperException.java:110) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769) at org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661) at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} A znode was created because the region server was splitting the region 4 seconds before: {quote} 2011-11-02 17:06:40,704 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101. 2011-11-02 17:06:40,704 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:62023-0x132f043bbde0710 Creating ephemeral node for f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Attempting to transition node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING ... 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Successfully transitioned node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLIT 2011-11-02 17:06:44,061 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for f7e1783e65ea8d621a4bc96ad310f101 {quote} Now that the master is dead the region server is spewing those last two lines like mad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4844) Coprocessor hooks for log rolling
[ https://issues.apache.org/jira/browse/HBASE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164022#comment-13164022 ] Lars Hofhansl commented on HBASE-4844: -- bq. The logroller could signal new log to copy? Right, and it could trigger a coprocessor hook to do the actual work of archiving. The coprocessor would get the path to the old file and then copy it somewhere else. Looking at the code, there're races, though. Until the HLogs.writer is set to the new writer, all writes would still go to the old file. So if the coprocessor post hook is before that and it makes a copy of the file some edit might be missed (that go into the old file after it was copied, but before the writer was switched over). Wouldn't it be nice if we had hardlinks in HDFS? :) So I think the coprocessor post hook should be called after the HLog.writer assignment. If it did the copy synchronously it only needs to finish before the next log for the same regionserver is rolled (still a race, though). I'll attach a very simple patch tonight or tomorrow morning and then folks can poke holes in it. Coprocessor hooks for log rolling - Key: HBASE-4844 URL: https://issues.apache.org/jira/browse/HBASE-4844 Project: HBase Issue Type: New Feature Affects Versions: 0.94.0 Reporter: Lars Hofhansl Priority: Minor In order to eventually do point in time recovery we need a way to reliably back up the logs. Rather than adding some hard coded changes, we can provide coprocessor hooks and folks can implement their own policies. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4968) Add to troubleshooting workaround for direct buffer oome's.
[ https://issues.apache.org/jira/browse/HBASE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-4968: - Attachment: client.oome.txt Add to troubleshooting workaround for direct buffer oome's. --- Key: HBASE-4968 URL: https://issues.apache.org/jira/browse/HBASE-4968 Project: HBase Issue Type: Task Reporter: stack Attachments: client.oome.txt Put into book workaround arrived at up on list discussing client oome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4968) Add to troubleshooting workaround for direct buffer oome's.
Add to troubleshooting workaround for direct buffer oome's. --- Key: HBASE-4968 URL: https://issues.apache.org/jira/browse/HBASE-4968 Project: HBase Issue Type: Task Reporter: stack Attachments: client.oome.txt Put into book workaround arrived at up on list discussing client oome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4968) Add to troubleshooting workaround for direct buffer oome's.
[ https://issues.apache.org/jira/browse/HBASE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-4968. -- Resolution: Fixed Fix Version/s: 0.94.0 Assignee: stack Committed to TRUNK. Add to troubleshooting workaround for direct buffer oome's. --- Key: HBASE-4968 URL: https://issues.apache.org/jira/browse/HBASE-4968 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.94.0 Attachments: client.oome.txt Put into book workaround arrived at up on list discussing client oome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4969) tautology in HRegionInfo.readFields
tautology in HRegionInfo.readFields --- Key: HBASE-4969 URL: https://issues.apache.org/jira/browse/HBASE-4969 Project: HBase Issue Type: Bug Reporter: Prakash Khemani Assignee: Prakash Khemani In HRegionInfo.readFields() the following looks wrong to me } else if (getVersion() == VERSION) { it is always true. Should it have been } else if (getVersion() == version) { version is a local variable where the deserialized-version is stored. (I am struggling with another issue where after applying some patches - including HBASE-4388 Second start after migration from 90 to trunk crashes my version of hbase-92 HRegionInfo.readFields() tries to find HTD in HRegionInfo and fails) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4940) hadoop-metrics.properties can include configuration of the rest context for ganglia
[ https://issues.apache.org/jira/browse/HBASE-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164047#comment-13164047 ] Mubarak Seyed commented on HBASE-4940: -- Waiting for corporate approval to contribute this patch. Thanks. hadoop-metrics.properties can include configuration of the rest context for ganglia - Key: HBASE-4940 URL: https://issues.apache.org/jira/browse/HBASE-4940 Project: HBase Issue Type: Improvement Components: metrics Affects Versions: 0.90.5 Environment: HBase-0.90.1 Reporter: Mubarak Seyed Priority: Minor Labels: hbase-rest Fix For: 0.90.5 It appears from hadoop-metrics.properties that configuration for rest context is missing. It would be good if we add the rest context and commented out them, if anyone is using rest-server and if they want to monitor using ganglia context then they can uncomment the rest context and use them for rest-server monitoring using ganglia. {code} # Configuration of the rest context for ganglia #rest.class=org.apache.hadoop.metrics.ganglia.GangliaContext #rest.period=10 #rest.servers=ganglia-metad-hostname:port {code} Working on the patch, will submit it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server
[ https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164046#comment-13164046 ] Mubarak Seyed commented on HBASE-4720: -- Waiting for corporate approval to contribute this patch. Thanks. Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server Key: HBASE-4720 URL: https://issues.apache.org/jira/browse/HBASE-4720 Project: HBase Issue Type: Improvement Reporter: Daniel Lord I have several large application/HBase clusters where an application node will occasionally need to talk to HBase from a different cluster. In order to help ensure some of my consistency guarantees I have a sentinel table that is updated atomically as users interact with the system. This works quite well for the regular hbase client but the REST client does not implement the checkAndPut and checkAndDelete operations. This exposes the application to some race conditions that have to be worked around. It would be ideal if the same checkAndPut/checkAndDelete operations could be supported by the REST client. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4957) Clean up some log messages, code in RecoverableZooKeeper
[ https://issues.apache.org/jira/browse/HBASE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164050#comment-13164050 ] ramkrishna.s.vasudevan commented on HBASE-4957: --- +1 on patch Clean up some log messages, code in RecoverableZooKeeper Key: HBASE-4957 URL: https://issues.apache.org/jira/browse/HBASE-4957 Project: HBase Issue Type: Improvement Components: zookeeper Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Fix For: 0.94.0 Attachments: hbase-4957.txt, hbase-4957.txt In RecoverableZooKeeper, there are a number of log messages and comments which don't really read correctly, and some other pieces of code that can be cleaned up. Simple cleanup - shouldn't be any actual behavioral changes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4951) master process can not be stopped when it is initializing
[ https://issues.apache.org/jira/browse/HBASE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164072#comment-13164072 ] ramkrishna.s.vasudevan commented on HBASE-4951: --- The master is getting stopped in 0.92. I tried it. Correct me if am wrong. If it is ok, Can we close this issue? master process can not be stopped when it is initializing - Key: HBASE-4951 URL: https://issues.apache.org/jira/browse/HBASE-4951 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.3 Reporter: xufeng Priority: Critical Fix For: 0.92.0, 0.90.5 It is easy to reproduce by following step: step1:start master process.(do not start regionserver process in the cluster). the master will wait the regionserver to check in: org.apache.hadoop.hbase.master.ServerManager: Waiting on regionserver(s) to checkin step2:stop the master by sh command bin/hbase master stop result:the master process will never die because catalogTracker.waitForRoot() method will block unitl the root region assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-4880: Attachment: hbase-4880v2.patch Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch, hbase-4880v2.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164079#comment-13164079 ] Phabricator commented on HBASE-4908: mbautin has commented on the revision [jira] [HBASE-4908] HBase cluster test tool (port from 0.89-fb). Thanks for reviews, Nicolas and Stack! See responses below. A new version of the code will follow. Still have to re-run the unit tests (I've been having some trouble with those recently -- http://pastebin.com/1G1ZcPeV) and sanity-check the command-line load tester. On a side note, here are some stats from my recent load test run on a 5-node cluster: 11/12/06 18:30:25 INFO util.MultiThreadedAction: [W:21] Keys=17180542, cols=819.4m, time=27:49:20 Overall: [keys/s= 171, latency=116 ms] Current: [keys/s=219, latency=90 ms], insertedUpTo=17180495, insertedQSize=26 11/12/06 18:30:25 INFO util.MultiThreadedAction: [R:10] Keys=250690067, cols=11.8g, time=27:49:20 Overall: [keys/s= 2502, latency=3 ms] Current: [keys/s=261, latency=38 ms], verified=250690067 (The number of writer's threads is reported as 21 because there is an inserted keys tracker thread that keeps track of the most recent contiguous key written by all writers.) INLINE COMMENTS src/test/java/org/apache/hadoop/hbase/util/IntegrationTestTool.java:129 Changed this to 80. There is no standard way to get terminal width in Java. http://stackoverflow.com/questions/1286461/can-i-find-the-console-width-with-java src/test/java/org/apache/hadoop/hbase/util/LoadTest.java:40 This is the command-line part of LoadTest, so it should not run as part of the test suite. The same multithreaded writer/verifier code is reused in a couple of large unit tests, namely TestMiniClusterLoad{Parallel,Sequential}. I added a line about the differences between this load tester and PerformanceEvaluation. Renamed this to LoadTestTool. src/test/java/org/apache/hadoop/hbase/util/LoadTest.java:318 We first wait for all writers to finish and then wait for all readers to finish, so we should exit when all of those threads stop. src/test/java/org/apache/hadoop/hbase/util/RestartMetaTest.java:37 Yes, this can be converted to an IntegrationTest. I think making this a unit test was proposed in the past but the controversy was that it spawns a bunch of child processes and extra care should be taken to shut down all of them. We can leave this as a command-line tool for now and convert to a unit test when it is more stable. src/main/java/org/apache/hadoop/hbase/util/Bytes.java:1658-1680 Removed these methods and reused the existing RegionSetting.HexStringSplit class. src/main/java/org/apache/hadoop/hbase/EmptyWatcher.java:28 I moved this to from the test jar to the main jar because of the waitForBaseZNode method that I added to ZKUtil. We need this kind of watcher for quick ZK checks when we don't want to use a callback. Passing a null instead of a watcher does not work. src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java:176 Not sure what changes you are talking about. I made a few conf options that I needed to use constants in HConstants. src/main/java/org/apache/hadoop/hbase/util/Keying.java:41-43 Removed this. src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java:1820-1821 Done. Thanks! src/test/java/org/apache/hadoop/hbase/util/IntegrationTestTool.java:39 Renamed this to AbstractHBaseTool and moved to the main jar, since this is general-purpose command line processing functionality. This also avoids the confusion with the IntegrationTest notation (thanks for the link!) I think we will be creating a lot of integration tests very soon. REVISION DETAIL https://reviews.facebook.net/D549 HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read
[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164083#comment-13164083 ] chunhui shen commented on HBASE-4880: - @all what about the patchv2? I remove {code} addToOnlineRegions(r); {code} from HReionserver#postOpenDeployTasks. It means that step2 and step3 are independent now, After calling postOpenDeployTasks, we should add region to onlineregions timely by the transaction Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch, hbase-4880v2.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-4908: --- Attachment: D549.8.patch mbautin updated the revision [jira] [HBASE-4908] HBase cluster test tool (port from 0.89-fb). Reviewers: stack, Karthik, Kannan, nspiegelberg, JIRA Addressing Stack's and Nicolas's comments. Still need to re-run the unit tests. REVISION DETAIL https://reviews.facebook.net/D549 AFFECTED FILES pom.xml src/main/java/org/apache/hadoop/hbase/EmptyWatcher.java src/test/java/org/apache/hadoop/hbase/EmptyWatcher.java src/main/java/org/apache/hadoop/hbase/HConstants.java src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java src/main/java/org/apache/hadoop/hbase/util/AbstractHBaseTool.java src/main/java/org/apache/hadoop/hbase/util/Bytes.java src/main/java/org/apache/hadoop/hbase/util/HMerge.java src/main/java/org/apache/hadoop/hbase/util/Keying.java src/main/java/org/apache/hadoop/hbase/util/RegionSplitter.java src/main/java/org/apache/hadoop/hbase/zookeeper/ZKConfig.java src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java src/test/java/org/apache/hadoop/hbase/regionserver/TestRegionSplitPolicy.java src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java src/test/java/org/apache/hadoop/hbase/util/LoadTestKVGenerator.java src/test/java/org/apache/hadoop/hbase/util/LoadTestTool.java src/test/java/org/apache/hadoop/hbase/util/MultiThreadedAction.java src/test/java/org/apache/hadoop/hbase/util/MultiThreadedReader.java src/test/java/org/apache/hadoop/hbase/util/MultiThreadedWriter.java src/test/java/org/apache/hadoop/hbase/util/ProcessBasedLocalHBaseCluster.java src/test/java/org/apache/hadoop/hbase/util/RestartMetaTest.java src/test/java/org/apache/hadoop/hbase/util/TestBytes.java src/test/java/org/apache/hadoop/hbase/util/TestLoadTestKVGenerator.java src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java src/test/java/org/apache/hadoop/hbase/util/TestMiniClusterLoadParallel.java src/test/java/org/apache/hadoop/hbase/util/TestMiniClusterLoadSequential.java HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch, D549.8.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4729) Clash between region unassign and splitting kills the master
[ https://issues.apache.org/jira/browse/HBASE-4729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164085#comment-13164085 ] Hudson commented on HBASE-4729: --- Integrated in HBase-0.92 #173 (See [https://builds.apache.org/job/HBase-0.92/173/]) HBASE-4729 Clash between region unassign and splitting kills the master stack : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java Clash between region unassign and splitting kills the master Key: HBASE-4729 URL: https://issues.apache.org/jira/browse/HBASE-4729 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Jean-Daniel Cryans Assignee: stack Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: 4729-v2.txt, 4729-v3.txt, 4729-v4.txt, 4729-v5.txt, 4729-v6-092.txt, 4729-v6-trunk.txt, 4729.txt I was running an online alter while regions were splitting, and suddenly the master died and left my table half-altered (haven't restarted the master yet). What killed the master: {quote} 2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception creating node CLOSING org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101 at org.apache.zookeeper.KeeperException.create(KeeperException.java:110) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769) at org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722) at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661) at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {quote} A znode was created because the region server was splitting the region 4 seconds before: {quote} 2011-11-02 17:06:40,704 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101. 2011-11-02 17:06:40,704 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:62023-0x132f043bbde0710 Creating ephemeral node for f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state 2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Attempting to transition node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING ... 2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Successfully transitioned node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLIT 2011-11-02 17:06:44,061 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for f7e1783e65ea8d621a4bc96ad310f101 {quote} Now that the master is dead the region server is spewing those last two lines like mad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164087#comment-13164087 ] Hadoop QA commented on HBASE-4908: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12506381/D549.8.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 52 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/455//console This message is automatically generated. HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch, D549.8.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4970) Add a parameter to change keepAliveTime of Htable thread pool.
Add a parameter to change keepAliveTime of Htable thread pool. --- Key: HBASE-4970 URL: https://issues.apache.org/jira/browse/HBASE-4970 Project: HBase Issue Type: Improvement Components: client Reporter: gaojinchao Assignee: gaojinchao Priority: Trivial In my cluster, I changed keepAliveTime from 60 s to 3600 s. Increasing RES is slowed down. Why increasing keepAliveTime of HBase thread pool is slowing down our problem occurance [RES value increase]? You can go through the source of sun.nio.ch.Util. Every thread hold 3 softreference of direct buffer(mustangsrc) for reusage. The code names the 3 softreferences buffercache. If the buffer was all occupied or none was suitable in size, and new request comes, new direct buffer is allocated. After the service, the bigger one replaces the smaller one in buffercache. The replaced buffer is released. So I think we can add a parameter to change keepAliveTime of Htable thread pool. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4970) Add a parameter to change keepAliveTime of Htable thread pool.
[ https://issues.apache.org/jira/browse/HBASE-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaojinchao updated HBASE-4970: -- Affects Version/s: 0.90.4 Fix Version/s: 0.90.5 Add a parameter to change keepAliveTime of Htable thread pool. --- Key: HBASE-4970 URL: https://issues.apache.org/jira/browse/HBASE-4970 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.90.4 Reporter: gaojinchao Assignee: gaojinchao Priority: Trivial Fix For: 0.90.5 In my cluster, I changed keepAliveTime from 60 s to 3600 s. Increasing RES is slowed down. Why increasing keepAliveTime of HBase thread pool is slowing down our problem occurance [RES value increase]? You can go through the source of sun.nio.ch.Util. Every thread hold 3 softreference of direct buffer(mustangsrc) for reusage. The code names the 3 softreferences buffercache. If the buffer was all occupied or none was suitable in size, and new request comes, new direct buffer is allocated. After the service, the bigger one replaces the smaller one in buffercache. The replaced buffer is released. So I think we can add a parameter to change keepAliveTime of Htable thread pool. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4908: -- Status: Open (was: Patch Available) HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, 0002-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch, D549.8.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4908: -- Status: Patch Available (was: Open) HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, 0002-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch, D549.8.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4908: -- Attachment: 0002-HBase-cluster-test-tool.patch Uploading a patch for Jenkins testing. HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, 0002-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch, D549.8.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4893) HConnectionImplementation closed-but-not-deleted, need a way to find the state of connection
[ https://issues.apache.org/jira/browse/HBASE-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164090#comment-13164090 ] Mubarak Seyed commented on HBASE-4893: -- HConnectionManager.deleteStaleConnection() is part of 0.92 but i would like to fix it in 0.90.5. Can we back-port deleteStaleConnection() to 0.90.5? If i use HConnectionManager.deleteConnection(connection, true) in 0.90.5, it is not going to clean-up the stale connection unless connection.isZeroReference() == true but it is not as getConnection(Conf) increments the count {code} public static HConnection getConnection(Configuration conf) throws ZooKeeperConnectionException { HConnectionKey connectionKey = new HConnectionKey(conf); synchronized (HBASE_INSTANCES) { .. connection.incCount(); return connection; } } {code} We need the following methods in 0.90.5 {code} public static void deleteStaleConnection(HConnection connection) { deleteConnection(connection, true, true); } private static void deleteConnection(HConnection connection, boolean stopProxy, boolean staleConnection) { synchronized (HBASE_INSTANCES) { for (EntryHConnectionKey, HConnectionImplementation connectionEntry : HBASE_INSTANCES .entrySet()) { if (connectionEntry.getValue() == connection) { deleteConnection(connectionEntry.getKey(), stopProxy, staleConnection); break; } } } } private static void deleteConnection(HConnectionKey connectionKey, boolean stopProxy, boolean staleConnection) { synchronized (HBASE_INSTANCES) { HConnectionImplementation connection = HBASE_INSTANCES .get(connectionKey); if (connection != null) { connection.decCount(); if (connection.isZeroReference() || staleConnection) { HBASE_INSTANCES.remove(connectionKey); connection.close(stopProxy); } else if (stopProxy) { connection.stopProxyOnClose(stopProxy); } } } } {code} and deleteConnection(conf, stopProxy) needs to call deleteConnection(new HConnectionKey(conf), stopProxy, false) {code} public static void deleteConnection(Configuration conf, boolean stopProxy) { deleteConnection(new HConnectionKey(conf), stopProxy, false); } {code} HConnectionImplementation closed-but-not-deleted, need a way to find the state of connection Key: HBASE-4893 URL: https://issues.apache.org/jira/browse/HBASE-4893 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.1, 0.90.2, 0.90.3, 0.90.4 Environment: Linux 2.6, HBase-0.90.1 Reporter: Mubarak Seyed Labels: hbase-client Fix For: 0.90.1, 0.90.5 In abort() of HConnectionManager$HConnectionImplementation, instance of HConnectionImplementation is marked as this.closed=true. There is no way for client application to check the hbase client connection whether it is still opened/good (this.closed=false) or not. We need a method to validate the state of a connection like isClosed(). {code} public boolean isClosed(){ return this.closed; } {code} Once the connection is closed and it should get deleted. Client application still gets a connection from HConnectionManager.getConnection(Configuration) and tries to make a RPC call to RS, since connection is already closed, HConnectionImplementation.getRegionServerWithRetries throws RetriesExhaustedException with error message {code} Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server null for region , row '----xxx', but failed after 10 attempts. Exceptions: java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException:
[jira] [Updated] (HBASE-4893) HConnectionImplementation closed-but-not-deleted, need a way to find the state of connection
[ https://issues.apache.org/jira/browse/HBASE-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mubarak Seyed updated HBASE-4893: - Labels: noob (was: hbase-client) HConnectionImplementation closed-but-not-deleted, need a way to find the state of connection Key: HBASE-4893 URL: https://issues.apache.org/jira/browse/HBASE-4893 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.1, 0.90.2, 0.90.3, 0.90.4 Environment: Linux 2.6, HBase-0.90.1 Reporter: Mubarak Seyed Labels: noob Fix For: 0.90.1, 0.90.5 In abort() of HConnectionManager$HConnectionImplementation, instance of HConnectionImplementation is marked as this.closed=true. There is no way for client application to check the hbase client connection whether it is still opened/good (this.closed=false) or not. We need a method to validate the state of a connection like isClosed(). {code} public boolean isClosed(){ return this.closed; } {code} Once the connection is closed and it should get deleted. Client application still gets a connection from HConnectionManager.getConnection(Configuration) and tries to make a RPC call to RS, since connection is already closed, HConnectionImplementation.getRegionServerWithRetries throws RetriesExhaustedException with error message {code} Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server null for region , row '----xxx', but failed after 10 attempts. Exceptions: java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1008) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164104#comment-13164104 ] Ted Yu commented on HBASE-4880: --- Did RestAdmin pass for patch v2? Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch, hbase-4880v2.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164104#comment-13164104 ] Ted Yu edited comment on HBASE-4880 at 12/7/11 3:12 AM: Did TestAdmin pass for patch v2? was (Author: yuzhih...@gmail.com): Did RestAdmin pass for patch v2? Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch, hbase-4880v2.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164110#comment-13164110 ] chunhui shen commented on HBASE-4880: - @Ted TestAdmin has passed for patch v2, and you could try again. Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch, hbase-4880v2.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4893) HConnectionImplementation closed-but-not-deleted, need a way to find the state of connection
[ https://issues.apache.org/jira/browse/HBASE-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164111#comment-13164111 ] Mubarak Seyed commented on HBASE-4893: -- Waiting for corporate approval to contribute this patch. Thanks. HConnectionImplementation closed-but-not-deleted, need a way to find the state of connection Key: HBASE-4893 URL: https://issues.apache.org/jira/browse/HBASE-4893 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.1, 0.90.2, 0.90.3, 0.90.4 Environment: Linux 2.6, HBase-0.90.1 Reporter: Mubarak Seyed Labels: noob Fix For: 0.90.1, 0.90.5 In abort() of HConnectionManager$HConnectionImplementation, instance of HConnectionImplementation is marked as this.closed=true. There is no way for client application to check the hbase client connection whether it is still opened/good (this.closed=false) or not. We need a method to validate the state of a connection like isClosed(). {code} public boolean isClosed(){ return this.closed; } {code} Once the connection is closed and it should get deleted. Client application still gets a connection from HConnectionManager.getConnection(Configuration) and tries to make a RPC call to RS, since connection is already closed, HConnectionImplementation.getRegionServerWithRetries throws RetriesExhaustedException with error message {code} Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server null for region , row '----xxx', but failed after 10 attempts. Exceptions: java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1008) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4893) HConnectionImplementation closed-but-not-deleted, need a way to find the state of connection
[ https://issues.apache.org/jira/browse/HBASE-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mubarak Seyed updated HBASE-4893: - Affects Version/s: (was: 0.90.4) (was: 0.90.3) (was: 0.90.2) Fix Version/s: (was: 0.90.1) HConnectionImplementation closed-but-not-deleted, need a way to find the state of connection Key: HBASE-4893 URL: https://issues.apache.org/jira/browse/HBASE-4893 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.1 Environment: Linux 2.6, HBase-0.90.1 Reporter: Mubarak Seyed Labels: noob Fix For: 0.90.5 In abort() of HConnectionManager$HConnectionImplementation, instance of HConnectionImplementation is marked as this.closed=true. There is no way for client application to check the hbase client connection whether it is still opened/good (this.closed=false) or not. We need a method to validate the state of a connection like isClosed(). {code} public boolean isClosed(){ return this.closed; } {code} Once the connection is closed and it should get deleted. Client application still gets a connection from HConnectionManager.getConnection(Configuration) and tries to make a RPC call to RS, since connection is already closed, HConnectionImplementation.getRegionServerWithRetries throws RetriesExhaustedException with error message {code} Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server null for region , row '----xxx', but failed after 10 attempts. Exceptions: java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed java.io.IOException: org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7eab48a7 closed at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1008) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4970) Add a parameter to change keepAliveTime of Htable thread pool.
[ https://issues.apache.org/jira/browse/HBASE-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164119#comment-13164119 ] ramkrishna.s.vasudevan commented on HBASE-4970: --- +1 on this change. Add a parameter to change keepAliveTime of Htable thread pool. --- Key: HBASE-4970 URL: https://issues.apache.org/jira/browse/HBASE-4970 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.90.4 Reporter: gaojinchao Assignee: gaojinchao Priority: Trivial Fix For: 0.90.5 In my cluster, I changed keepAliveTime from 60 s to 3600 s. Increasing RES is slowed down. Why increasing keepAliveTime of HBase thread pool is slowing down our problem occurance [RES value increase]? You can go through the source of sun.nio.ch.Util. Every thread hold 3 softreference of direct buffer(mustangsrc) for reusage. The code names the 3 softreferences buffercache. If the buffer was all occupied or none was suitable in size, and new request comes, new direct buffer is allocated. After the service, the bigger one replaces the smaller one in buffercache. The replaced buffer is released. So I think we can add a parameter to change keepAliveTime of Htable thread pool. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4908) HBase cluster test tool (port from 0.89-fb)
[ https://issues.apache.org/jira/browse/HBASE-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164120#comment-13164120 ] Hadoop QA commented on HBASE-4908: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12506382/0002-HBase-cluster-test-tool.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 74 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -160 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 74 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildHole org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildOverlap org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/456//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/456//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/456//console This message is automatically generated. HBase cluster test tool (port from 0.89-fb) --- Key: HBASE-4908 URL: https://issues.apache.org/jira/browse/HBASE-4908 Project: HBase Issue Type: Sub-task Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: 0001-HBase-cluster-test-tool.patch, 0002-HBase-cluster-test-tool.patch, D549.1.patch, D549.2.patch, D549.3.patch, D549.4.patch, D549.5.patch, D549.6.patch, D549.7.patch, D549.8.patch Porting one of our HBase cluster test tools (a single-process multi-threaded load generator and verifier) from 0.89-fb to trunk. I cleaned up the code a bit compared to what's in 0.89-fb, and discovered that it has some features that I have not tried yet (some kind of a kill test, and some way to run HBase as multiple processes on one machine). The main utility of this piece of code for us has been the HBaseClusterTest command-line tool (called HBaseTest in 0.89-fb), which we usually invoke as a load test in our five-node dev cluster testing, e.g.: hbase org.apache.hadoop.hbase.manual.HBaseTest -load 10:50:100:20 -tn load_test -read 1:10:50:20 -zk zk_quorum -bloom ROWCOL -compression GZIP I will be using this code to load-test the delta encoding patch and making fixes, but I am submitting the patch for early feedback. I will probably try out its other functionality and comment on how it works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164122#comment-13164122 ] Zhihong Yu commented on HBASE-4880: --- Patch v2 is cleaner. Minor comment: The change in RegionServerServices.java shouldn't put explanation for parameters on a separate line - parameter name and explanation should be on the same line. Good job Chunhui. Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch, hbase-4880v2.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4970) Add a parameter to change keepAliveTime of Htable thread pool.
[ https://issues.apache.org/jira/browse/HBASE-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164123#comment-13164123 ] Zhihong Yu commented on HBASE-4970: --- @Jinchao: Can you upload a patch ? Thanks Add a parameter to change keepAliveTime of Htable thread pool. --- Key: HBASE-4970 URL: https://issues.apache.org/jira/browse/HBASE-4970 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.90.4 Reporter: gaojinchao Assignee: gaojinchao Priority: Trivial Fix For: 0.90.5 In my cluster, I changed keepAliveTime from 60 s to 3600 s. Increasing RES is slowed down. Why increasing keepAliveTime of HBase thread pool is slowing down our problem occurance [RES value increase]? You can go through the source of sun.nio.ch.Util. Every thread hold 3 softreference of direct buffer(mustangsrc) for reusage. The code names the 3 softreferences buffercache. If the buffer was all occupied or none was suitable in size, and new request comes, new direct buffer is allocated. After the service, the bigger one replaces the smaller one in buffercache. The replaced buffer is released. So I think we can add a parameter to change keepAliveTime of Htable thread pool. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4970) Add a parameter to change keepAliveTime of Htable thread pool.
[ https://issues.apache.org/jira/browse/HBASE-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164127#comment-13164127 ] gaojinchao commented on HBASE-4970: --- ok, No problem. Add a parameter to change keepAliveTime of Htable thread pool. --- Key: HBASE-4970 URL: https://issues.apache.org/jira/browse/HBASE-4970 Project: HBase Issue Type: Improvement Components: client Affects Versions: 0.90.4 Reporter: gaojinchao Assignee: gaojinchao Priority: Trivial Fix For: 0.90.5 In my cluster, I changed keepAliveTime from 60 s to 3600 s. Increasing RES is slowed down. Why increasing keepAliveTime of HBase thread pool is slowing down our problem occurance [RES value increase]? You can go through the source of sun.nio.ch.Util. Every thread hold 3 softreference of direct buffer(mustangsrc) for reusage. The code names the 3 softreferences buffercache. If the buffer was all occupied or none was suitable in size, and new request comes, new direct buffer is allocated. After the service, the bigger one replaces the smaller one in buffercache. The replaced buffer is released. So I think we can add a parameter to change keepAliveTime of Htable thread pool. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4224) Need a flush by regionserver rather than by table option
[ https://issues.apache.org/jira/browse/HBASE-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164130#comment-13164130 ] Akash Ashok commented on HBASE-4224: I am done with the testing for this patch. But though when I was writing the test cases I could only think of one. Could some1 please help me out as to what different test cases should I be looking to cover for this patch ? Need a flush by regionserver rather than by table option Key: HBASE-4224 URL: https://issues.apache.org/jira/browse/HBASE-4224 Project: HBase Issue Type: Bug Components: shell Reporter: stack Assignee: Akash Ashok Attachments: HBase-4224.patch This evening needed to clean out logs on the cluster. logs are by regionserver. to let go of logs, we need to have all edits emptied from memory. only flush is by table or region. We need to be able to flush the regionserver. Need to add this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164133#comment-13164133 ] chunhui shen commented on HBASE-4880: - @Ted I use the code style by the HBASE-3678. It will put explanation for parameters on a separate line automatically. Is there any problem of HBASE-3678 ? Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch, hbase-4880v2.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-4880: Attachment: hbase-4880v3.patch Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch, hbase-4880v2.patch, hbase-4880v3.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4927) CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty
[ https://issues.apache.org/jira/browse/HBASE-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164148#comment-13164148 ] Hudson commented on HBASE-4927: --- Integrated in HBase-0.92 #174 (See [https://builds.apache.org/job/HBase-0.92/174/]) HBASE-4927 CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty stack : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionInfo.java CatalogJanior:SplitParentFirstComparator doesn't sort as expected, for the last region when the endkey is empty --- Key: HBASE-4927 URL: https://issues.apache.org/jira/browse/HBASE-4927 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.92.0, 0.94.0 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.92.0 Attachments: 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator-.patch, 0001-HBASE-4927-CatalogJanior-SplitParentFirstComparator_v2.patch, hbase-4927-fix-ws.txt When reviewing HBASE-4238 backporting, Jon found this issue. What happens if the split points are (empty end key is the last key, empty start key is the first key) Parent [A,) L daughter [A,B), R daughter [B,) When sorted, we gets to end key comparision which results in this incorrector order: [A,B), [A,), [B,) we wanted: [A,), [A,B), [B,) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164149#comment-13164149 ] chunhui shen commented on HBASE-4880: - @Ted Amend the problem of putting explanation for parameters on a separate line in patchv3. Please check. Thanks Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch, hbase-4880v2.patch, hbase-4880v3.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164154#comment-13164154 ] Hadoop QA commented on HBASE-4880: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12506391/hbase-4880v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -160 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 72 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestColumnSeeking Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/458//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/458//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/458//console This message is automatically generated. Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch, hbase-4880v2.patch, hbase-4880v3.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164155#comment-13164155 ] Hadoop QA commented on HBASE-4880: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12506380/hbase-4880v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -160 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 72 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildHole org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildOverlap Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/457//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/457//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/457//console This message is automatically generated. Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch, hbase-4880v2.patch, hbase-4880v3.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164168#comment-13164168 ] Ted Yu commented on HBASE-4880: --- Please check failed tests. It seems trunk is broken now. Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch, hbase-4880v2.patch, hbase-4880v3.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4880) Region is on service before completing openRegionHanlder, may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164169#comment-13164169 ] chunhui shen commented on HBASE-4880: - I think it is not related to this patch. The same failed tests appear in https://builds.apache.org/job/PreCommit-HBASE-Build/456/testReport/ Region is on service before completing openRegionHanlder, may cause data loss - Key: HBASE-4880 URL: https://issues.apache.org/jira/browse/HBASE-4880 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-4880.patch, hbase-4880v2.patch, hbase-4880v3.patch OpenRegionHandler in regionserver is processed as the following steps: {code} 1.openregion()(Through it, closed = false, closing = false) 2.addToOnlineRegions(region) 3.update .meta. table 4.update ZK's node state to RS_ZK_REGION_OPEND {code} We can find that region is on service before Step 4. It means client could put data to this region after step 3. What will happen if step 4 is failed processing? It will execute OpenRegionHandler#cleanupFailedOpen which will do closing region, and master assign this region to another regionserver. If closing region is failed, the data which is put between step 3 and step 4 may loss, because the region has been opend on another regionserver and be put new data. Therefore, it may not be recoverd through replayRecoveredEdit() because the edit's LogSeqId is smaller than current region SeqId. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4966) Put/Delete values cannot be tested with MRUnit
[ https://issues.apache.org/jira/browse/HBASE-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164190#comment-13164190 ] Lars Hofhansl commented on HBASE-4966: -- Do you think you can work out a patch Nicholas? Put/Delete values cannot be tested with MRUnit -- Key: HBASE-4966 URL: https://issues.apache.org/jira/browse/HBASE-4966 Project: HBase Issue Type: Bug Components: client, mapreduce Affects Versions: 0.90.4 Reporter: Nicholas Telford Priority: Minor When using the IdentityTableReducer, which expects input values of either a Put or Delete object, testing with MRUnit the Mapper with MRUnit is not possible because neither Put nor Delete implement equals(). We should implement equals() on both such that equality means: * Both objects are of the same class (in this case, Put or Delete) * Both objects are for the same key. * Both objects contain an equal set of KeyValues (applicable only to Put) KeyValue.equals() appears to already be implemented, but only checks for equality of row key, column family and column qualifier - two KeyValues can be considered equal if they contain different values. This won't work for testing. Instead, the Put.equals() and Delete.equals() implementations should do a deep equality check on their KeyValues, like this: {code:java} myKv.equals(theirKv) Bytes.equals(myKv.getValue(), theirKv.getValue()); {code} NOTE: This would impact any code that relies on the existing identity implementation of Put.equals() and Delete.equals(), therefore cannot be guaranteed to be backwards-compatible. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4961) Lots of precommit builds hanging for days
[ https://issues.apache.org/jira/browse/HBASE-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164191#comment-13164191 ] Lars Hofhansl commented on HBASE-4961: -- I was hoping that these would all be hanging because of SecureRandom issues (blocking on /dev/random on Linux), but that does not appear to be the case. Lots of precommit builds hanging for days - Key: HBASE-4961 URL: https://issues.apache.org/jira/browse/HBASE-4961 Project: HBase Issue Type: Bug Components: build, test Affects Versions: 0.92.0, 0.94.0 Reporter: Todd Lipcon Attachments: hbase-hung-builds.tar.gz I was logged into the ASF build machines and saw about 10-15 HBase precommit builds that have been hung for weeks. I took a jstack of each, which I'll attach here. I then kill -9ed them to free up the resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4936) Cached HRegionInterface connections crash when getting UnknownHost exceptions
[ https://issues.apache.org/jira/browse/HBASE-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164203#comment-13164203 ] Hudson commented on HBASE-4936: --- Integrated in HBase-TRUNK #2523 (See [https://builds.apache.org/job/HBase-TRUNK/2523/]) HBASE-4936 Cached HRegionInterface connections crash when getting UnknownHost exceptions stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java Cached HRegionInterface connections crash when getting UnknownHost exceptions - Key: HBASE-4936 URL: https://issues.apache.org/jira/browse/HBASE-4936 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Reporter: Andrei Dragomir Assignee: Andrei Dragomir Fix For: 0.94.0 Attachments: HBASE-4936-v2.patch, HBASE-4936.patch This isssue is unlikely to come up in a cluster test case. However, for development, the following thing happens: 1. Start the HBase cluster locally, on network A (DNS A, etc) 2. The region locations are cached using the hostname (mycomputer.company.com, 211.x.y.z - real ip) 3. Change network location (go home) 4. Start the HBase cluster locally. My hostname / ips are not different (mycomputer, 192.168.0.130 - new ip) If the region locations have been cached using the hostname, there is an UnknownHostException in CatalogTracker.getCachedConnection(ServerName sn), uncaught in the catch statements. The server will crash constantly. The error should be caught and not rethrown, so that the cached connection expires normally. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4937) Error in Quick Start Shell Exercises
[ https://issues.apache.org/jira/browse/HBASE-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164204#comment-13164204 ] Hudson commented on HBASE-4937: --- Integrated in HBase-TRUNK #2523 (See [https://builds.apache.org/job/HBase-TRUNK/2523/]) HBASE-4937 Error in Quick Start Shell Exercises stack : Files : * /hbase/trunk/src/docbkx/getting_started.xml Error in Quick Start Shell Exercises Key: HBASE-4937 URL: https://issues.apache.org/jira/browse/HBASE-4937 Project: HBase Issue Type: Bug Components: documentation Reporter: Ryan Berdeen Assignee: stack Fix For: 0.94.0 Attachments: 4937.txt The shell exercises in the Quick Start (http://hbase.apache.org/book/quickstart.html) starts {code} hbase(main):003:0 create 'test', 'cf' 0 row(s) in 1.2200 seconds hbase(main):003:0 list 'table' test 1 row(s) in 0.0550 seconds {code} It looks like the second command is wrong. Running it, the actual output is {code} hbase(main):001:0 create 'test', 'cf' 0 row(s) in 0.3630 seconds hbase(main):002:0 list 'table' TABLE 0 row(s) in 0.0100 seconds {code} The argument to list should be 'test', not 'table', and the output in the example is missing the {{TABLE}} line. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-4956) Control direct memory buffer consumption by HBaseClient
[ https://issues.apache.org/jira/browse/HBASE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164199#comment-13164199 ] Lars Hofhansl edited comment on HBASE-4956 at 12/7/11 7:18 AM: --- We might not have to go all the way to use netty (although that would be nice). If we find that it is possible to avoid calling HBaseClient.Connection.sendParam from the client thread, but have the call actually be made from the Connection thread (after the callable representing the operation was queued), we have limited the number of threads that will have cached DirectBuffer on their behalf. Stack suggested the only reason for the direct call might be to pass errors back to the client. We could hand the client a Deferred or Future that will eventually hold any encountered exception, the client could (and would by default to keep the current synchronous behavior) also wait on that object. was (Author: lhofhansl): We might not have to go all the way to use netty (although that would be nice). If we find that it is possible to avoid calling HBaseClient.Connection.sendParam from the client thread, but have the call actually be made from the Connection thread (after the callable representing the operation was queued), we have limited the number of threads that will have cached DirectBuffer on their behalf. Stack suggested the only reason for the direct call might be to pass errors back to the client. We could hand the client a Deferred or Future that will eventually hold any encountered exception, the client could (and would be default to keep the current synchronous behavior) also wait on that object. Control direct memory buffer consumption by HBaseClient --- Key: HBASE-4956 URL: https://issues.apache.org/jira/browse/HBASE-4956 Project: HBase Issue Type: New Feature Reporter: Ted Yu As Jonathan explained here https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357?pli=1 , standard hbase client inadvertently consumes large amount of direct memory. We should consider using netty for NIO-related tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4968) Add to troubleshooting workaround for direct buffer oome's.
[ https://issues.apache.org/jira/browse/HBASE-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164201#comment-13164201 ] Hudson commented on HBASE-4968: --- Integrated in HBase-TRUNK #2523 (See [https://builds.apache.org/job/HBase-TRUNK/2523/]) HBASE-4968 Add to troubleshooting workaround for direct buffer oome's. stack : Files : * /hbase/trunk/src/docbkx/troubleshooting.xml Add to troubleshooting workaround for direct buffer oome's. --- Key: HBASE-4968 URL: https://issues.apache.org/jira/browse/HBASE-4968 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.94.0 Attachments: client.oome.txt Put into book workaround arrived at up on list discussing client oome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4376) Document login configuration when running on top of secure Hadoop with Kerberos auth enabled
[ https://issues.apache.org/jira/browse/HBASE-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164202#comment-13164202 ] Hudson commented on HBASE-4376: --- Integrated in HBase-TRUNK #2523 (See [https://builds.apache.org/job/HBase-TRUNK/2523/]) HBASE-4376 Document mutual authentication between HBase and Zookeeper using SASL stack : Files : * /hbase/trunk/src/docbkx/configuration.xml Document login configuration when running on top of secure Hadoop with Kerberos auth enabled Key: HBASE-4376 URL: https://issues.apache.org/jira/browse/HBASE-4376 Project: HBase Issue Type: Task Components: documentation, security Affects Versions: 0.90.4 Reporter: Gary Helmling We provide basic support for HBase to run on top of kerberos-authenticated Hadoop, by providing configuration options to have HMaster and HRegionServer login from a keytab on startup. But this isn't documented anywhere outside of hbase-default.xml. We need to provide some basic guidance on setup in the HBase docs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira