[jira] [Updated] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-5568: Attachment: HBASE-5568v2.patch Multi concurrent flushcache() for one region could cause data loss -- Key: HBASE-5568 URL: https://issues.apache.org/jira/browse/HBASE-5568 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, HBASE-5568v2.patch, HBASE-5568v2.patch We could call HRegion#flushcache() concurrently now through HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin. However, we find if HRegion#internalFlushcache() is called concurrently by multi thread, HRegion.memstoreSize will be calculated wrong. At the end of HRegion#internalFlushcache(), we will do this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is negative and prevent next flush if we close this region. Logs in RS for region e9d827913a056e696c39bc569ea3 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 128.0m 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, memsize=59.6m, filesize=31.2m 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 134.8m 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, memsize=68.5m, filesize=26.6m 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.1m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction requested=false 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 6.8m 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, memsize=3.1m, filesize=1.6m 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, memsize=3.6m, filesize=1.4m 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~134.8m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction requested=true 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, memsize=47.4k, filesize=25.6k 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, memsize=47.8k, filesize=19.3k 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~6.8m for region writetest1,,1331454657410.e9d827913a05 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction requested=true -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230940#comment-13230940 ] Hadoop QA commented on HBASE-5568: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518621/HBASE-5568v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1198//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1198//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1198//console This message is automatically generated. Multi concurrent flushcache() for one region could cause data loss -- Key: HBASE-5568 URL: https://issues.apache.org/jira/browse/HBASE-5568 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, HBASE-5568v2.patch, HBASE-5568v2.patch We could call HRegion#flushcache() concurrently now through HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin. However, we find if HRegion#internalFlushcache() is called concurrently by multi thread, HRegion.memstoreSize will be calculated wrong. At the end of HRegion#internalFlushcache(), we will do this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is negative and prevent next flush if we close this region. Logs in RS for region e9d827913a056e696c39bc569ea3 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 128.0m 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, memsize=59.6m, filesize=31.2m 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 134.8m 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, memsize=68.5m, filesize=26.6m 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.1m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction requested=false 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 6.8m 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, memsize=3.1m, filesize=1.6m 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, memsize=3.6m, filesize=1.4m 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~134.8m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction requested=true 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: Added
[jira] [Commented] (HBASE-4269) Add tests and restore semantics to TableInputFormat/TableRecordReader
[ https://issues.apache.org/jira/browse/HBASE-4269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230972#comment-13230972 ] Jan Lukavsky commented on HBASE-4269: - Hi, I think patch to this issue changed semantics for mapreduce API. In HBASE-4196 there was no change in semantics in org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl, the only change was in org.apache.hadoop.hbase.mapred.TableRecordReaderImpl (where the catch of UnknownScannerException was changed to IOException). Now the semantics of mapreduce API is different of the one before HBASE-4196, and I think this should be reverted. Is there any reason why to have different semantics for the two APIs? Wouldn't it be better to accept the change of semantics in HBASE-4196? Are there any negative side-effects of this change? I don't see any discussion of the type do we need to change the semantics back? Thanks for reply :) Jan Add tests and restore semantics to TableInputFormat/TableRecordReader - Key: HBASE-4269 URL: https://issues.apache.org/jira/browse/HBASE-4269 Project: HBase Issue Type: Improvement Components: mapred, mapreduce, test Affects Versions: 0.90.5, 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.90.5 Attachments: 0001-HBASE-4269-Add-tests-and-restore-semantics-to-TableI.patch, 0001-HBASE-4269-Add-tests-and-restore-semantics-to-TableI.patch HBASE-4196 Modified the semantics of failures in TableImportFormat/TableRecordReader, and had no tests cases. This patch restores semantics to rethrow when a DoNotRetryIOException is triggered and adds test cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5549: --- Status: Open (was: Patch Available) Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v6.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5549: --- Attachment: nochange.patch Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v6.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5549: --- Status: Patch Available (was: Open) Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v6.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4914) Enhance MapReduce TableInputFormat to Support N-mappers per Region
[ https://issues.apache.org/jira/browse/HBASE-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231008#comment-13231008 ] Cosmin Lehene commented on HBASE-4914: -- Hadoop 0.20 doesn't behave well with large number of map tasks, so we implemented a N-Regions per map (through a splits_per_map property). I guess ideally we should be able to specify a min/max number of map tasks as well and have these two happen implicitly, perhaps with some sane thresholds. Enhance MapReduce TableInputFormat to Support N-mappers per Region -- Key: HBASE-4914 URL: https://issues.apache.org/jira/browse/HBASE-4914 Project: HBase Issue Type: Sub-task Components: client, regionserver Reporter: Nicolas Spiegelberg Priority: Blocker Fix For: 0.94.0 Current TableInputFormat based MR jobs create exactly one mapper per region where each mapper sets one Scan with appropriate start/stop row keys. This change allows jobs to be run with any number of mappers per region, so that when a mapper fails, there will be less data to be reprocessed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4608) HLog Compression
[ https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231009#comment-13231009 ] Lars Francke commented on HBASE-4608: - This seems to be missing documentation, no? Shouldn't the hbase.regionserver.wal.enablecompression key at least be in hbase-default.xml? HLog Compression Key: HBASE-4608 URL: https://issues.apache.org/jira/browse/HBASE-4608 Project: HBase Issue Type: New Feature Reporter: Li Pi Assignee: Li Pi Fix For: 0.94.0 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt The current bottleneck to HBase write speed is replicating the WAL appends across different datanodes. We can speed up this process by compressing the HLog. Current plan involves using a dictionary to compress table name, region id, cf name, and possibly other bits of repeated data. Also, HLog format may be changed in other ways to produce a smaller HLog. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231029#comment-13231029 ] Hadoop QA commented on HBASE-5549: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518639/nochange.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1199//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1199//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1199//console This message is automatically generated. Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v6.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5549: --- Status: Open (was: Patch Available) Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v6.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5549: --- Status: Patch Available (was: Open) Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v6.patch, 5549.v7.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5549: --- Attachment: 5549.v7.patch Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v6.patch, 5549.v7.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231159#comment-13231159 ] Hadoop QA commented on HBASE-5549: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518660/5549.v7.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 20 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestMultiParallel org.apache.hadoop.hbase.TestDrainingServer org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1200//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1200//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1200//console This message is automatically generated. Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v6.patch, 5549.v7.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5549: --- Status: Patch Available (was: Open) Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5549: --- Status: Open (was: Patch Available) Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5549: --- Attachment: 5549.v8.patch Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5549: --- Status: Open (was: Patch Available) Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5549: --- Attachment: 5549.v9.patch Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5549: --- Status: Patch Available (was: Open) Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5563) HRegionInfo#compareTo add the comparison of regionId
[ https://issues.apache.org/jira/browse/HBASE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231187#comment-13231187 ] Jonathan Hsieh commented on HBASE-5563: --- @Chunhui I get it -- we just didn't see the robot run the v2 to show that it fixed the problem. Are you ok with the newer patch (and test?) that fixes getRegionsOfTable? I think it is more intuitive since older HRI's with smaller datestamp/regionIds are smaller than newer HRI's with larger datestamp/regionIds. It's helpful to add comments and tests to show what you intend -- hopefully the updated patch I provided makes it clear. HRegionInfo#compareTo add the comparison of regionId Key: HBASE-5563 URL: https://issues.apache.org/jira/browse/HBASE-5563 Project: HBase Issue Type: Bug Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-5563.patch, HBASE-5563v2.patch, HBASE-5563v2.patch, hbase-5563-v3-0.92.patch, hbase-5563-v3.patch In the one region multi assigned case, we could find that two regions have the same table name, same startKey, same endKey, and different regionId, so these two regions are same in TreeMap but different in HashMap. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5578) NPE when regionserver reported server load, caused rs stop.
[ https://issues.apache.org/jira/browse/HBASE-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231204#comment-13231204 ] Storm Lee commented on HBASE-5578: -- I meet it only once, may not reproduce easily. My Hbase cluster has 9 RSes, only one crushed. It is fresh start up , nothing under /hbase. And than I use a tool to put data all the time(use HTable.put()). This RS runs about 45 hours already when the NPE happened. The compaction and split also continued all the time. NPE when regionserver reported server load, caused rs stop. --- Key: HBASE-5578 URL: https://issues.apache.org/jira/browse/HBASE-5578 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Environment: centos6.2 hadoop-1.0.0 hbase-0.92.0 Reporter: Storm Lee Priority: Critical The regeionserver log: 2012-03-11 11:55:37,808 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server data3,60020,1331286604591: Unhandled exception: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.Store.getTotalStaticIndexSize(Store.java:1788) at org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:994) at org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:800) at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:776) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:678) at java.lang.Thread.run(Thread.java:662) 2012-03-11 11:55:37,808 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [] 2012-03-11 11:55:37,808 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: requestsPerSecond=1687, numberOfOnlineRegions=37, numberOfStores=37, numberOfStorefiles=144, storefileIndexSizeMB=2, rootIndexSizeKB=2362, totalStaticIndexSizeKB=229808, totalStaticBloomSizeKB=2166296, memstoreSizeMB=2854, readRequestsCount=1352673, writeRequestsCount=113137586, compactionQueueSize=8, flushQueueSize=3, usedHeapMB=7359, maxHeapMB=12999, blockCacheSizeMB=32.31, blockCacheFreeMB=3867.52, blockCacheCount=38, blockCacheHitCount=87713, blockCacheMissCount=22144560, blockCacheEvictedCount=122, blockCacheHitRatio=0%, blockCacheHitCachingRatio=99%, hdfsBlocksLocalityIndex=100 2012-03-11 11:55:37,992 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled exception: null -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231217#comment-13231217 ] Hadoop QA commented on HBASE-5549: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518664/5549.v8.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 20 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.TestZooKeeper Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1201//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1201//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1201//console This message is automatically generated. Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK
[ https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Jindal updated HBASE-5206: --- Attachment: 5206_trunk_latest_2.patch Port HBASE-5155 to 0.92, 0.94, and TRUNK Key: HBASE-5206 URL: https://issues.apache.org/jira/browse/HBASE-5206 Project: HBase Issue Type: Bug Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Zhihong Yu Assignee: Ashutosh Jindal Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk-v3.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted) to 0.92 and TRUNK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5563) HRegionInfo#compareTo add the comparison of regionId
[ https://issues.apache.org/jira/browse/HBASE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231236#comment-13231236 ] Jonathan Hsieh commented on HBASE-5563: --- Tests came back clean or came back with failures that were intermittent but clean when run locally on all version. HRegionInfo#compareTo add the comparison of regionId Key: HBASE-5563 URL: https://issues.apache.org/jira/browse/HBASE-5563 Project: HBase Issue Type: Bug Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-5563.patch, HBASE-5563v2.patch, HBASE-5563v2.patch, hbase-5563-v3-0.92.patch, hbase-5563-v3.patch In the one region multi assigned case, we could find that two regions have the same table name, same startKey, same endKey, and different regionId, so these two regions are same in TreeMap but different in HashMap. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5549: --- Status: Open (was: Patch Available) Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231242#comment-13231242 ] Hadoop QA commented on HBASE-5549: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518665/5549.v9.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 20 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.TestZooKeeper org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestImportTsv Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1202//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1202//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1202//console This message is automatically generated. Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5549: --- Status: Patch Available (was: Open) Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v10.patch, 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5589) Add of the offline call to the Master Interface
[ https://issues.apache.org/jira/browse/HBASE-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh reassigned HBASE-5589: - Assignee: Jonathan Hsieh Add of the offline call to the Master Interface --- Key: HBASE-5589 URL: https://issues.apache.org/jira/browse/HBASE-5589 Project: HBase Issue Type: Sub-task Components: hbck Affects Versions: 0.90.6, 0.92.0, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Hbck from HBASE-5128 requires an offline method on the master to properly cleanup state during certain assignment repair operations. This will this method will be added to recent and older versions of HBase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-5588) Deprecate/remove AssignmentManager#clearRegionFromTransition
[ https://issues.apache.org/jira/browse/HBASE-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh reassigned HBASE-5588: - Assignee: Jonathan Hsieh Deprecate/remove AssignmentManager#clearRegionFromTransition Key: HBASE-5588 URL: https://issues.apache.org/jira/browse/HBASE-5588 Project: HBase Issue Type: Sub-task Components: hbck Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh This method is essentially a dupe of Assignment#regionOffline. As suggested in early review of HBASE-5128 - deprecate up to 0.94 and remove from 0.96/trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5592) Make it easier to get a table from shell
Make it easier to get a table from shell Key: HBASE-5592 URL: https://issues.apache.org/jira/browse/HBASE-5592 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.94.0 Reporter: Ben West Assignee: Ben West Priority: Trivial Fix For: 0.94.0 Attachments: publicTable.patch The one argument constructor to HTable was removed at some point, which means that you now have to pass in a Configuration to instantiate an HTable. This is annoying for me when I create quick scripts. This JIRA is a tiny patch which lets you get an HTable instance in the shell by doing {code}foo_table = @shell.hbase_table('foo').table{code} Basically, it is changing table to be a public member rather than a private one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK
[ https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231284#comment-13231284 ] Hadoop QA commented on HBASE-5206: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518673/5206_trunk_latest_2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.TestZooKeeper Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1203//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1203//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1203//console This message is automatically generated. Port HBASE-5155 to 0.92, 0.94, and TRUNK Key: HBASE-5206 URL: https://issues.apache.org/jira/browse/HBASE-5206 Project: HBase Issue Type: Bug Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Zhihong Yu Assignee: Ashutosh Jindal Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk-v3.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted) to 0.92 and TRUNK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231299#comment-13231299 ] Hadoop QA commented on HBASE-5549: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518676/5549.v10.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 20 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1204//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1204//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1204//console This message is automatically generated. Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v10.patch, 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK
[ https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231306#comment-13231306 ] Zhihong Yu commented on HBASE-5206: --- Recent result is better. Good progress. {code} + * @return True if the table is present {code} Please change 'True' to 'true' {code} +while (!ZKTable.isEnabledTable(zkw, testMasterAdmin)) { + Thread.sleep(100); +} {code} Please shorten sleep interval to 10. A timeout is desirable for the wait. Port HBASE-5155 to 0.92, 0.94, and TRUNK Key: HBASE-5206 URL: https://issues.apache.org/jira/browse/HBASE-5206 Project: HBase Issue Type: Bug Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Zhihong Yu Assignee: Ashutosh Jindal Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk-v3.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted) to 0.92 and TRUNK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231305#comment-13231305 ] nkeywal commented on HBASE-5549: Can be committed. Hopefully this is the end of the ZooKeeper expiry flakiness. Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v10.patch, 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5593) Reverse DNS resolution in regionServerStartup() does not strip trailing dot
Reverse DNS resolution in regionServerStartup() does not strip trailing dot --- Key: HBASE-5593 URL: https://issues.apache.org/jira/browse/HBASE-5593 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.5 Reporter: David S. Wang Assignee: David S. Wang Fix For: 0.90.7 HBASE-4109 covered the removal of trailing dots in PTR records from reverse DNS lookups. We seem to have missed a case in HMaster#regionServerStartup(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5568: -- Attachment: (was: HBASE-5568v2.patch) Multi concurrent flushcache() for one region could cause data loss -- Key: HBASE-5568 URL: https://issues.apache.org/jira/browse/HBASE-5568 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, HBASE-5568v2.patch We could call HRegion#flushcache() concurrently now through HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin. However, we find if HRegion#internalFlushcache() is called concurrently by multi thread, HRegion.memstoreSize will be calculated wrong. At the end of HRegion#internalFlushcache(), we will do this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is negative and prevent next flush if we close this region. Logs in RS for region e9d827913a056e696c39bc569ea3 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 128.0m 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, memsize=59.6m, filesize=31.2m 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 134.8m 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, memsize=68.5m, filesize=26.6m 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.1m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction requested=false 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 6.8m 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, memsize=3.1m, filesize=1.6m 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, memsize=3.6m, filesize=1.4m 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~134.8m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction requested=true 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, memsize=47.4k, filesize=25.6k 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, memsize=47.8k, filesize=19.3k 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~6.8m for region writetest1,,1331454657410.e9d827913a05 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction requested=true -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231318#comment-13231318 ] Zhihong Yu commented on HBASE-5568: --- Patch v2 looks good. Will integrate if there is no objection. Multi concurrent flushcache() for one region could cause data loss -- Key: HBASE-5568 URL: https://issues.apache.org/jira/browse/HBASE-5568 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, HBASE-5568v2.patch We could call HRegion#flushcache() concurrently now through HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin. However, we find if HRegion#internalFlushcache() is called concurrently by multi thread, HRegion.memstoreSize will be calculated wrong. At the end of HRegion#internalFlushcache(), we will do this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is negative and prevent next flush if we close this region. Logs in RS for region e9d827913a056e696c39bc569ea3 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 128.0m 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, memsize=59.6m, filesize=31.2m 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 134.8m 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, memsize=68.5m, filesize=26.6m 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.1m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction requested=false 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 6.8m 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, memsize=3.1m, filesize=1.6m 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, memsize=3.6m, filesize=1.4m 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~134.8m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction requested=true 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, memsize=47.4k, filesize=25.6k 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, memsize=47.8k, filesize=19.3k 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~6.8m for region writetest1,,1331454657410.e9d827913a05 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction requested=true -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5490) Move the enum RS_ZK_REGION_FAILED_OPEN to the last of the enum list in 0.90 EventHandler
[ https://issues.apache.org/jira/browse/HBASE-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5490: - Fix Version/s: (was: 0.90.6) 0.90.7 Move the enum RS_ZK_REGION_FAILED_OPEN to the last of the enum list in 0.90 EventHandler Key: HBASE-5490 URL: https://issues.apache.org/jira/browse/HBASE-5490 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Fix For: 0.90.7 Attachments: 5490-v2.txt, HBASE-5490.patch The new state that was added RS_ZK_REGION_FAILED_OPEN was failing the rolling restart. So move the new enum to the end of the list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5153: - Fix Version/s: (was: 0.90.6) 0.90.7 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers --- Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.90.7 Attachments: 5153-92.txt, 5153-trunk-minimal.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBASE-5153_addendum_0.90_1.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231324#comment-13231324 ] Zhihong Yu commented on HBASE-5568: --- @Chunhui: A patch for 0.92 is needed: {code} p0 HBASE-5568v2.patch ... Hunk #1 FAILED at 152. 1 out of 1 hunk FAILED -- saving rejects to file src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java.rej {code} Please also update patch for 0.90 as well. Multi concurrent flushcache() for one region could cause data loss -- Key: HBASE-5568 URL: https://issues.apache.org/jira/browse/HBASE-5568 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, HBASE-5568v2.patch We could call HRegion#flushcache() concurrently now through HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin. However, we find if HRegion#internalFlushcache() is called concurrently by multi thread, HRegion.memstoreSize will be calculated wrong. At the end of HRegion#internalFlushcache(), we will do this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is negative and prevent next flush if we close this region. Logs in RS for region e9d827913a056e696c39bc569ea3 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 128.0m 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, memsize=59.6m, filesize=31.2m 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 134.8m 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, memsize=68.5m, filesize=26.6m 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.1m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction requested=false 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 6.8m 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, memsize=3.1m, filesize=1.6m 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, memsize=3.6m, filesize=1.4m 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~134.8m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction requested=true 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, memsize=47.4k, filesize=25.6k 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, memsize=47.8k, filesize=19.3k 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~6.8m for region writetest1,,1331454657410.e9d827913a05 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction requested=true -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5588) Deprecate/remove AssignmentManager#clearRegionFromTransition
[ https://issues.apache.org/jira/browse/HBASE-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5588: -- Status: Patch Available (was: Open) Suite against the four versions is running. Deprecate/remove AssignmentManager#clearRegionFromTransition Key: HBASE-5588 URL: https://issues.apache.org/jira/browse/HBASE-5588 Project: HBase Issue Type: Sub-task Components: hbck Affects Versions: 0.92.0, 0.90.5, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: hbase-5588-0.90.patch, hbase-5588-0.94.patch, hbase-5588.patch This method is essentially a dupe of Assignment#regionOffline. As suggested in early review of HBASE-5128 - deprecate up to 0.94 and remove from 0.96/trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5588) Deprecate/remove AssignmentManager#clearRegionFromTransition
[ https://issues.apache.org/jira/browse/HBASE-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-5588: -- Attachment: hbase-5588-0.90.patch hbase-5588-0.94.patch hbase-5588.patch hbase-5588.patch removes clearRegionFromTransition hbase-5588-0.94.patch deprecates and is also applicable to 0.92 hbase-5588-0.90.patch deprecates. Deprecate/remove AssignmentManager#clearRegionFromTransition Key: HBASE-5588 URL: https://issues.apache.org/jira/browse/HBASE-5588 Project: HBase Issue Type: Sub-task Components: hbck Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: hbase-5588-0.90.patch, hbase-5588-0.94.patch, hbase-5588.patch This method is essentially a dupe of Assignment#regionOffline. As suggested in early review of HBASE-5128 - deprecate up to 0.94 and remove from 0.96/trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5588) Deprecate/remove AssignmentManager#clearRegionFromTransition
[ https://issues.apache.org/jira/browse/HBASE-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231339#comment-13231339 ] Hadoop QA commented on HBASE-5588: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518692/hbase-5588-0.90.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1205//console This message is automatically generated. Deprecate/remove AssignmentManager#clearRegionFromTransition Key: HBASE-5588 URL: https://issues.apache.org/jira/browse/HBASE-5588 Project: HBase Issue Type: Sub-task Components: hbck Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: hbase-5588-0.90.patch, hbase-5588-0.94.patch, hbase-5588.patch This method is essentially a dupe of Assignment#regionOffline. As suggested in early review of HBASE-5128 - deprecate up to 0.94 and remove from 0.96/trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5588) Deprecate/remove AssignmentManager#clearRegionFromTransition
[ https://issues.apache.org/jira/browse/HBASE-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231341#comment-13231341 ] Zhihong Yu commented on HBASE-5588: --- +1 on patches. Hopefully Hadoop QA can pick up hbase-5588.patch Deprecate/remove AssignmentManager#clearRegionFromTransition Key: HBASE-5588 URL: https://issues.apache.org/jira/browse/HBASE-5588 Project: HBase Issue Type: Sub-task Components: hbck Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: hbase-5588-0.90.patch, hbase-5588-0.94.patch, hbase-5588.patch This method is essentially a dupe of Assignment#regionOffline. As suggested in early review of HBASE-5128 - deprecate up to 0.94 and remove from 0.96/trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5593) Reverse DNS resolution in regionServerStartup() does not strip trailing dot
[ https://issues.apache.org/jira/browse/HBASE-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David S. Wang updated HBASE-5593: - Attachment: HBASE-5593.patch Reverse DNS resolution in regionServerStartup() does not strip trailing dot --- Key: HBASE-5593 URL: https://issues.apache.org/jira/browse/HBASE-5593 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: David S. Wang Assignee: David S. Wang Fix For: 0.90.7 Attachments: HBASE-5593.patch HBASE-4109 covered the removal of trailing dots in PTR records from reverse DNS lookups. We seem to have missed a case in HMaster#regionServerStartup(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5593) Reverse DNS resolution in regionServerStartup() does not strip trailing dot
[ https://issues.apache.org/jira/browse/HBASE-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David S. Wang updated HBASE-5593: - Affects Version/s: (was: 0.90.5) 0.90.6 Status: Patch Available (was: Open) One-liner change. Ran through Jenkins. Reverse DNS resolution in regionServerStartup() does not strip trailing dot --- Key: HBASE-5593 URL: https://issues.apache.org/jira/browse/HBASE-5593 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: David S. Wang Assignee: David S. Wang Fix For: 0.90.7 Attachments: HBASE-5593.patch HBASE-4109 covered the removal of trailing dots in PTR records from reverse DNS lookups. We seem to have missed a case in HMaster#regionServerStartup(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4269) Add tests and restore semantics to TableInputFormat/TableRecordReader
[ https://issues.apache.org/jira/browse/HBASE-4269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231356#comment-13231356 ] Jonathan Hsieh commented on HBASE-4269: --- Jan, When I did this patch, the two versions had different error recovery path and I made them similar. UnknownScannerException is a subclass DoNotRetryIOException so I chose that instead. I'm assuming this is causing some pain now -- how is this affecting the job you are running? (is it catching and rethrowing other exceptions as well?) If there is something we need to change I'm fine with that. Let's file a new issue -- this patch has been in included in a few releases now. Add tests and restore semantics to TableInputFormat/TableRecordReader - Key: HBASE-4269 URL: https://issues.apache.org/jira/browse/HBASE-4269 Project: HBase Issue Type: Improvement Components: mapred, mapreduce, test Affects Versions: 0.90.5, 0.92.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Fix For: 0.90.5 Attachments: 0001-HBASE-4269-Add-tests-and-restore-semantics-to-TableI.patch, 0001-HBASE-4269-Add-tests-and-restore-semantics-to-TableI.patch HBASE-4196 Modified the semantics of failures in TableImportFormat/TableRecordReader, and had no tests cases. This patch restores semantics to rethrow when a DoNotRetryIOException is triggered and adds test cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231361#comment-13231361 ] Zhihong Yu commented on HBASE-5568: --- Integrated to TRUNK. Thanks for the patch Chunhui. Multi concurrent flushcache() for one region could cause data loss -- Key: HBASE-5568 URL: https://issues.apache.org/jira/browse/HBASE-5568 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, HBASE-5568v2.patch We could call HRegion#flushcache() concurrently now through HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin. However, we find if HRegion#internalFlushcache() is called concurrently by multi thread, HRegion.memstoreSize will be calculated wrong. At the end of HRegion#internalFlushcache(), we will do this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is negative and prevent next flush if we close this region. Logs in RS for region e9d827913a056e696c39bc569ea3 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 128.0m 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, memsize=59.6m, filesize=31.2m 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 134.8m 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, memsize=68.5m, filesize=26.6m 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.1m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction requested=false 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 6.8m 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, memsize=3.1m, filesize=1.6m 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, memsize=3.6m, filesize=1.4m 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~134.8m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction requested=true 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, memsize=47.4k, filesize=25.6k 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, memsize=47.8k, filesize=19.3k 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~6.8m for region writetest1,,1331454657410.e9d827913a05 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction requested=true -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231361#comment-13231361 ] Zhihong Yu edited comment on HBASE-5568 at 3/16/12 4:41 PM: Integrated to TRUNK. Thanks for the patch Chunhui. Thanks for the review Ramkrishna and Lars. was (Author: zhi...@ebaysf.com): Integrated to TRUNK. Thanks for the patch Chunhui. Multi concurrent flushcache() for one region could cause data loss -- Key: HBASE-5568 URL: https://issues.apache.org/jira/browse/HBASE-5568 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, HBASE-5568v2.patch We could call HRegion#flushcache() concurrently now through HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin. However, we find if HRegion#internalFlushcache() is called concurrently by multi thread, HRegion.memstoreSize will be calculated wrong. At the end of HRegion#internalFlushcache(), we will do this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is negative and prevent next flush if we close this region. Logs in RS for region e9d827913a056e696c39bc569ea3 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 128.0m 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, memsize=59.6m, filesize=31.2m 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 134.8m 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, memsize=68.5m, filesize=26.6m 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.1m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction requested=false 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 6.8m 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, memsize=3.1m, filesize=1.6m 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, memsize=3.6m, filesize=1.4m 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~134.8m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction requested=true 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, memsize=47.4k, filesize=25.6k 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, memsize=47.8k, filesize=19.3k 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~6.8m for region writetest1,,1331454657410.e9d827913a05 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction requested=true -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5593) Reverse DNS resolution in regionServerStartup() does not strip trailing dot
[ https://issues.apache.org/jira/browse/HBASE-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231363#comment-13231363 ] Hadoop QA commented on HBASE-5593: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518698/HBASE-5593.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1206//console This message is automatically generated. Reverse DNS resolution in regionServerStartup() does not strip trailing dot --- Key: HBASE-5593 URL: https://issues.apache.org/jira/browse/HBASE-5593 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: David S. Wang Assignee: David S. Wang Fix For: 0.90.7 Attachments: HBASE-5593.patch HBASE-4109 covered the removal of trailing dots in PTR records from reverse DNS lookups. We seem to have missed a case in HMaster#regionServerStartup(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5593) Reverse DNS resolution in regionServerStartup() does not strip trailing dot
[ https://issues.apache.org/jira/browse/HBASE-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231370#comment-13231370 ] David S. Wang commented on HBASE-5593: -- This will not apply to trunk as the robot is trying to do, because the fix is only applicable to 0.90.x. Also, I can add a test case if necessary, though I think the fix is fairly obvious in this case. Let me know. Reverse DNS resolution in regionServerStartup() does not strip trailing dot --- Key: HBASE-5593 URL: https://issues.apache.org/jira/browse/HBASE-5593 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: David S. Wang Assignee: David S. Wang Fix For: 0.90.7 Attachments: HBASE-5593.patch HBASE-4109 covered the removal of trailing dots in PTR records from reverse DNS lookups. We seem to have missed a case in HMaster#regionServerStartup(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5594) Unable to stop a master that's waiting on -ROOT- during initialization
Unable to stop a master that's waiting on -ROOT- during initialization -- Key: HBASE-5594 URL: https://issues.apache.org/jira/browse/HBASE-5594 Project: HBase Issue Type: Bug Affects Versions: 0.92.1 Reporter: Jean-Daniel Cryans Fix For: 0.92.2, 0.94.0, 0.96.0 We just had a case where the master (that was just restarted) was having a hard time assigning -ROOT- (all the PRI handlers were full already) so we tried to shutdown the cluster and even though all the RS closed down properly the master kept running being blocked on: {noformat} master-sv4r20s12,10302,1331916142866 prio=10 tid=0x7f3708008800 nid=0x4b20 in Object.wait() [0x7f370d1d] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x0006030be3f8 (a org.apache.hadoop.hbase.zookeeper.RootRegionTracker) at java.lang.Object.wait(Object.java:485) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131) - locked 0x0006030be3f8 (a org.apache.hadoop.hbase.zookeeper.RootRegionTracker) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104) - locked 0x0006030be3f8 (a org.apache.hadoop.hbase.zookeeper.RootRegionTracker) at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:313) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:571) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:336) at java.lang.Thread.run(Thread.java:662) {noformat} I haven't checked the 0.90 code, we got this on 0.92.1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK
[ https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5206: -- Attachment: 5206_trunk_latest_3.patch Patch v3 addresses my review comments. TestAdmin#testEnableDisableAddColumnDeleteColumn passes. Port HBASE-5155 to 0.92, 0.94, and TRUNK Key: HBASE-5206 URL: https://issues.apache.org/jira/browse/HBASE-5206 Project: HBase Issue Type: Bug Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Zhihong Yu Assignee: Ashutosh Jindal Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk-v3.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 5206_trunk_latest_3.patch This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted) to 0.92 and TRUNK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231418#comment-13231418 ] Zhihong Yu commented on HBASE-5549: --- TestMasterZKSessionRecovery is removed. Is it covered in other tests now ? Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v10.patch, 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231441#comment-13231441 ] nkeywal commented on HBASE-5549: Yes, see HBASE-5572 for the reasons... On Fri, Mar 16, 2012 at 6:41 PM, Zhihong Yu (Commented) (JIRA) Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v10.patch, 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231444#comment-13231444 ] nkeywal commented on HBASE-5549: Can't create a review, I got error 500 as well... Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v10.patch, 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK
[ https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231447#comment-13231447 ] Hadoop QA commented on HBASE-5206: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518704/5206_trunk_latest_3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.TestZooKeeper Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1207//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1207//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1207//console This message is automatically generated. Port HBASE-5155 to 0.92, 0.94, and TRUNK Key: HBASE-5206 URL: https://issues.apache.org/jira/browse/HBASE-5206 Project: HBase Issue Type: Bug Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Zhihong Yu Assignee: Ashutosh Jindal Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk-v3.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 5206_trunk_latest_3.patch This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted) to 0.92 and TRUNK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231453#comment-13231453 ] nkeywal commented on HBASE-5549: Now it works: https://reviews.apache.org/r/4391/ Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v10.patch, 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-5595) Fix NoSuchMethodException in 0.92 when running on local filesystem
Fix NoSuchMethodException in 0.92 when running on local filesystem -- Key: HBASE-5595 URL: https://issues.apache.org/jira/browse/HBASE-5595 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Fix For: 0.92.2 Fix this ugly exception that shows when running 0.92.1 when on local filesystem: {code} 2012-03-16 10:54:48,351 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: getNumCurrentReplicas--HDFS-826 not available; hdfs_out=org.apache.hadoop.fs.FSDataOutputStream@301abf87 java.lang.NoSuchMethodException: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas() at java.lang.Class.getDeclaredMethod(Class.java:1937) at org.apache.hadoop.hbase.regionserver.wal.HLog.getGetNumCurrentReplicas(HLog.java:425) at org.apache.hadoop.hbase.regionserver.wal.HLog.init(HLog.java:408) at org.apache.hadoop.hbase.regionserver.wal.HLog.init(HLog.java:331) at org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateHLog(HRegionServer.java:1229) at org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1218) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:937) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:648) at java.lang.Thread.run(Thread.java:680) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5520) Support reseek() at RegionScanner
[ https://issues.apache.org/jira/browse/HBASE-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231459#comment-13231459 ] Lars Hofhansl commented on HBASE-5520: -- +1 on 0.94. Another question about the patch: {code} +public synchronized boolean reseek(byte[] row) throws IOException { ... + startRegionOperation(); {code} # why start a region operation here? This is called only called from a coprocessor, right? So should already be in a region operation. # why synchronized? Support reseek() at RegionScanner - Key: HBASE-5520 URL: https://issues.apache.org/jira/browse/HBASE-5520 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.92.0 Reporter: Anoop Sam John Assignee: ramkrishna.s.vasudevan Fix For: 0.96.0 Attachments: HBASE-5520_1.patch, HBASE-5520_2.patch, HBASE-5520_3.patch reseek() is not supported currently at the RegionScanner level. We can support the same. This is created following the discussion under HBASE-2038 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK
[ https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5206: -- Attachment: (was: 5206_trunk_latest_3.patch) Port HBASE-5155 to 0.92, 0.94, and TRUNK Key: HBASE-5206 URL: https://issues.apache.org/jira/browse/HBASE-5206 Project: HBase Issue Type: Bug Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Zhihong Yu Assignee: Ashutosh Jindal Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 5206_trunk_latest_3.patch This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted) to 0.92 and TRUNK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231464#comment-13231464 ] Hudson commented on HBASE-5568: --- Integrated in HBase-TRUNK #2684 (See [https://builds.apache.org/job/HBase-TRUNK/2684/]) HBASE-5568 Multi concurrent flushcache() for one region could cause data loss (Chunhui) (Revision 1301639) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java Multi concurrent flushcache() for one region could cause data loss -- Key: HBASE-5568 URL: https://issues.apache.org/jira/browse/HBASE-5568 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, HBASE-5568v2.patch We could call HRegion#flushcache() concurrently now through HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin. However, we find if HRegion#internalFlushcache() is called concurrently by multi thread, HRegion.memstoreSize will be calculated wrong. At the end of HRegion#internalFlushcache(), we will do this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is negative and prevent next flush if we close this region. Logs in RS for region e9d827913a056e696c39bc569ea3 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 128.0m 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, memsize=59.6m, filesize=31.2m 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 134.8m 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, memsize=68.5m, filesize=26.6m 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.1m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction requested=false 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 6.8m 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, memsize=3.1m, filesize=1.6m 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, memsize=3.6m, filesize=1.4m 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~134.8m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction requested=true 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, memsize=47.4k, filesize=25.6k 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, memsize=47.8k, filesize=19.3k 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~6.8m for region writetest1,,1331454657410.e9d827913a05 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction requested=true -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Updated] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK
[ https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5206: -- Attachment: (was: 5206_trunk-v3.patch) Port HBASE-5155 to 0.92, 0.94, and TRUNK Key: HBASE-5206 URL: https://issues.apache.org/jira/browse/HBASE-5206 Project: HBase Issue Type: Bug Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Zhihong Yu Assignee: Ashutosh Jindal Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 5206_trunk_latest_3.patch This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted) to 0.92 and TRUNK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231467#comment-13231467 ] Zhihong Yu commented on HBASE-5568: --- Integrated to 0.94 as well. Multi concurrent flushcache() for one region could cause data loss -- Key: HBASE-5568 URL: https://issues.apache.org/jira/browse/HBASE-5568 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, HBASE-5568v2.patch We could call HRegion#flushcache() concurrently now through HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin. However, we find if HRegion#internalFlushcache() is called concurrently by multi thread, HRegion.memstoreSize will be calculated wrong. At the end of HRegion#internalFlushcache(), we will do this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is negative and prevent next flush if we close this region. Logs in RS for region e9d827913a056e696c39bc569ea3 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 128.0m 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, memsize=59.6m, filesize=31.2m 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 134.8m 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, memsize=68.5m, filesize=26.6m 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.1m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction requested=false 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 6.8m 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, memsize=3.1m, filesize=1.6m 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, memsize=3.6m, filesize=1.4m 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~134.8m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction requested=true 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, memsize=47.4k, filesize=25.6k 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, memsize=47.8k, filesize=19.3k 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~6.8m for region writetest1,,1331454657410.e9d827913a05 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction requested=true -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231491#comment-13231491 ] stack commented on HBASE-5549: -- @N LGTM. Change reconnectAfterExpiry to reconnectAfterExpiration and post patch here. Will run it through hadoopqa then commit if all good. Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v10.patch, 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5581) Creating a table with invalid syntax does not give an error message when it fails
[ https://issues.apache.org/jira/browse/HBASE-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231495#comment-13231495 ] Mikhail Bautin commented on HBASE-5581: --- Is this OK to commit to trunk? Could someone +1? Thanks! Creating a table with invalid syntax does not give an error message when it fails - Key: HBASE-5581 URL: https://issues.apache.org/jira/browse/HBASE-5581 Project: HBase Issue Type: Bug Components: shell Reporter: Binu John Priority: Minor Attachments: D2343.1.patch Creating a table with invalid syntax does not give an error message when it fails. In this case, it doesn't actually create the CF requested, but doesn't give any indication to the user that it failed. create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, NUMREGIONS = 20, SPLITALGO = HexStringSplit, COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'} 0 row(s) in 3.0930 seconds hbase(main):002:0 describe 'test' DESCRIPTION ENABLED {NAME = 'test', FAMILIES = []} true 1 row(s) in 0.1430 seconds Putting {NUMREGIONS = 20, SPLITALGO = HexStringSplit} into a separate stanza works fine, so the feature is fine. create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'}, {NUMREGIONS = 20, SPLITALGO = HexStringSplit} 0 row(s) in 2.7860 seconds hbase(main):002:0 describe 'test' DESCRIPTION ENABLED {NAME = 'test', FAMILIES = [{NAME = 'test', DATA_BLOCK_ENCODING = 'NONE', true BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', COMPRESSION = 'LZO', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = '65536', BLOOMFILTER_ERRORRATE = ' 0.01', ENCODE_ON_DISK = 'true', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]} We should throw an error if we can't create the CF so it's clear to the user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5581) Creating a table with invalid syntax does not give an error message when it fails
[ https://issues.apache.org/jira/browse/HBASE-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231502#comment-13231502 ] Lars Hofhansl commented on HBASE-5581: -- +1 Creating a table with invalid syntax does not give an error message when it fails - Key: HBASE-5581 URL: https://issues.apache.org/jira/browse/HBASE-5581 Project: HBase Issue Type: Bug Components: shell Reporter: Binu John Priority: Minor Attachments: D2343.1.patch Creating a table with invalid syntax does not give an error message when it fails. In this case, it doesn't actually create the CF requested, but doesn't give any indication to the user that it failed. create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, NUMREGIONS = 20, SPLITALGO = HexStringSplit, COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'} 0 row(s) in 3.0930 seconds hbase(main):002:0 describe 'test' DESCRIPTION ENABLED {NAME = 'test', FAMILIES = []} true 1 row(s) in 0.1430 seconds Putting {NUMREGIONS = 20, SPLITALGO = HexStringSplit} into a separate stanza works fine, so the feature is fine. create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'}, {NUMREGIONS = 20, SPLITALGO = HexStringSplit} 0 row(s) in 2.7860 seconds hbase(main):002:0 describe 'test' DESCRIPTION ENABLED {NAME = 'test', FAMILIES = [{NAME = 'test', DATA_BLOCK_ENCODING = 'NONE', true BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', COMPRESSION = 'LZO', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = '65536', BLOOMFILTER_ERRORRATE = ' 0.01', ENCODE_ON_DISK = 'true', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]} We should throw an error if we can't create the CF so it's clear to the user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231507#comment-13231507 ] stack commented on HBASE-5568: -- +1 Good find Chunhui. Multi concurrent flushcache() for one region could cause data loss -- Key: HBASE-5568 URL: https://issues.apache.org/jira/browse/HBASE-5568 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, HBASE-5568v2.patch We could call HRegion#flushcache() concurrently now through HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin. However, we find if HRegion#internalFlushcache() is called concurrently by multi thread, HRegion.memstoreSize will be calculated wrong. At the end of HRegion#internalFlushcache(), we will do this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is negative and prevent next flush if we close this region. Logs in RS for region e9d827913a056e696c39bc569ea3 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 128.0m 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, memsize=59.6m, filesize=31.2m 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 134.8m 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, memsize=68.5m, filesize=26.6m 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.1m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction requested=false 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 6.8m 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, memsize=3.1m, filesize=1.6m 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, memsize=3.6m, filesize=1.4m 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~134.8m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction requested=true 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, memsize=47.4k, filesize=25.6k 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, memsize=47.8k, filesize=19.3k 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~6.8m for region writetest1,,1331454657410.e9d827913a05 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction requested=true -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5581) Creating a table with invalid syntax does not give an error message when it fails
[ https://issues.apache.org/jira/browse/HBASE-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5581: - Attachment: 5581trunk.patch What I committed to trunk and 0.94. Its a one-liner only. Creating a table with invalid syntax does not give an error message when it fails - Key: HBASE-5581 URL: https://issues.apache.org/jira/browse/HBASE-5581 Project: HBase Issue Type: Bug Components: shell Reporter: Binu John Priority: Minor Fix For: 0.94.0, 0.96.0 Attachments: 5581trunk.patch, D2343.1.patch Creating a table with invalid syntax does not give an error message when it fails. In this case, it doesn't actually create the CF requested, but doesn't give any indication to the user that it failed. create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, NUMREGIONS = 20, SPLITALGO = HexStringSplit, COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'} 0 row(s) in 3.0930 seconds hbase(main):002:0 describe 'test' DESCRIPTION ENABLED {NAME = 'test', FAMILIES = []} true 1 row(s) in 0.1430 seconds Putting {NUMREGIONS = 20, SPLITALGO = HexStringSplit} into a separate stanza works fine, so the feature is fine. create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'}, {NUMREGIONS = 20, SPLITALGO = HexStringSplit} 0 row(s) in 2.7860 seconds hbase(main):002:0 describe 'test' DESCRIPTION ENABLED {NAME = 'test', FAMILIES = [{NAME = 'test', DATA_BLOCK_ENCODING = 'NONE', true BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', COMPRESSION = 'LZO', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = '65536', BLOOMFILTER_ERRORRATE = ' 0.01', ENCODE_ON_DISK = 'true', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]} We should throw an error if we can't create the CF so it's clear to the user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5581) Creating a table with invalid syntax does not give an error message when it fails
[ https://issues.apache.org/jira/browse/HBASE-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-5581. -- Resolution: Fixed Hadoop Flags: Reviewed Thanks for the patch Binu. Committed trunk and 0.94. Creating a table with invalid syntax does not give an error message when it fails - Key: HBASE-5581 URL: https://issues.apache.org/jira/browse/HBASE-5581 Project: HBase Issue Type: Bug Components: shell Reporter: Binu John Priority: Minor Fix For: 0.94.0, 0.96.0 Attachments: 5581trunk.patch, D2343.1.patch Creating a table with invalid syntax does not give an error message when it fails. In this case, it doesn't actually create the CF requested, but doesn't give any indication to the user that it failed. create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, NUMREGIONS = 20, SPLITALGO = HexStringSplit, COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'} 0 row(s) in 3.0930 seconds hbase(main):002:0 describe 'test' DESCRIPTION ENABLED {NAME = 'test', FAMILIES = []} true 1 row(s) in 0.1430 seconds Putting {NUMREGIONS = 20, SPLITALGO = HexStringSplit} into a separate stanza works fine, so the feature is fine. create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'}, {NUMREGIONS = 20, SPLITALGO = HexStringSplit} 0 row(s) in 2.7860 seconds hbase(main):002:0 describe 'test' DESCRIPTION ENABLED {NAME = 'test', FAMILIES = [{NAME = 'test', DATA_BLOCK_ENCODING = 'NONE', true BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', COMPRESSION = 'LZO', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = '65536', BLOOMFILTER_ERRORRATE = ' 0.01', ENCODE_ON_DISK = 'true', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]} We should throw an error if we can't create the CF so it's clear to the user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231517#comment-13231517 ] Zhihong Yu commented on HBASE-5568: --- Integrated to 0.90 branch as well. Multi concurrent flushcache() for one region could cause data loss -- Key: HBASE-5568 URL: https://issues.apache.org/jira/browse/HBASE-5568 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, HBASE-5568v2.patch We could call HRegion#flushcache() concurrently now through HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin. However, we find if HRegion#internalFlushcache() is called concurrently by multi thread, HRegion.memstoreSize will be calculated wrong. At the end of HRegion#internalFlushcache(), we will do this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is negative and prevent next flush if we close this region. Logs in RS for region e9d827913a056e696c39bc569ea3 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 128.0m 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, memsize=59.6m, filesize=31.2m 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 134.8m 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, memsize=68.5m, filesize=26.6m 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.1m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction requested=false 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 6.8m 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, memsize=3.1m, filesize=1.6m 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, memsize=3.6m, filesize=1.4m 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~134.8m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction requested=true 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, memsize=47.4k, filesize=25.6k 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, memsize=47.8k, filesize=19.3k 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~6.8m for region writetest1,,1331454657410.e9d827913a05 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction requested=true -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK
[ https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231526#comment-13231526 ] Hadoop QA commented on HBASE-5206: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518712/5206_trunk_latest_3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1208//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1208//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1208//console This message is automatically generated. Port HBASE-5155 to 0.92, 0.94, and TRUNK Key: HBASE-5206 URL: https://issues.apache.org/jira/browse/HBASE-5206 Project: HBase Issue Type: Bug Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Zhihong Yu Assignee: Ashutosh Jindal Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 5206_trunk_latest_3.patch This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted) to 0.92 and TRUNK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK
[ https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231538#comment-13231538 ] Zhihong Yu commented on HBASE-5206: --- Integrated 5206_trunk_latest_3.patch to TRUNK. Port HBASE-5155 to 0.92, 0.94, and TRUNK Key: HBASE-5206 URL: https://issues.apache.org/jira/browse/HBASE-5206 Project: HBase Issue Type: Bug Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Zhihong Yu Assignee: Ashutosh Jindal Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 5206_trunk_latest_3.patch This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted) to 0.92 and TRUNK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5581) Creating a table with invalid syntax does not give an error message when it fails
[ https://issues.apache.org/jira/browse/HBASE-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231539#comment-13231539 ] Hudson commented on HBASE-5581: --- Integrated in HBase-0.94 #34 (See [https://builds.apache.org/job/HBase-0.94/34/]) HBASE-5581 Creating a table with invalid syntax does not give an error message when it fails (Revision 1301690) Result = SUCCESS stack : Files : * /hbase/branches/0.94/src/main/ruby/hbase/admin.rb Creating a table with invalid syntax does not give an error message when it fails - Key: HBASE-5581 URL: https://issues.apache.org/jira/browse/HBASE-5581 Project: HBase Issue Type: Bug Components: shell Reporter: Binu John Priority: Minor Fix For: 0.94.0, 0.96.0 Attachments: 5581trunk.patch, D2343.1.patch Creating a table with invalid syntax does not give an error message when it fails. In this case, it doesn't actually create the CF requested, but doesn't give any indication to the user that it failed. create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, NUMREGIONS = 20, SPLITALGO = HexStringSplit, COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'} 0 row(s) in 3.0930 seconds hbase(main):002:0 describe 'test' DESCRIPTION ENABLED {NAME = 'test', FAMILIES = []} true 1 row(s) in 0.1430 seconds Putting {NUMREGIONS = 20, SPLITALGO = HexStringSplit} into a separate stanza works fine, so the feature is fine. create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'}, {NUMREGIONS = 20, SPLITALGO = HexStringSplit} 0 row(s) in 2.7860 seconds hbase(main):002:0 describe 'test' DESCRIPTION ENABLED {NAME = 'test', FAMILIES = [{NAME = 'test', DATA_BLOCK_ENCODING = 'NONE', true BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', COMPRESSION = 'LZO', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = '65536', BLOOMFILTER_ERRORRATE = ' 0.01', ENCODE_ON_DISK = 'true', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]} We should throw an error if we can't create the CF so it's clear to the user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss
[ https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231540#comment-13231540 ] Hudson commented on HBASE-5568: --- Integrated in HBase-0.94 #34 (See [https://builds.apache.org/job/HBase-0.94/34/]) HBASE-5568 Multi concurrent flushcache() for one region could cause data loss (Chunhui) (Revision 1301676) Result = SUCCESS tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java Multi concurrent flushcache() for one region could cause data loss -- Key: HBASE-5568 URL: https://issues.apache.org/jira/browse/HBASE-5568 Project: HBase Issue Type: Bug Components: regionserver Reporter: chunhui shen Assignee: chunhui shen Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, HBASE-5568v2.patch We could call HRegion#flushcache() concurrently now through HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin. However, we find if HRegion#internalFlushcache() is called concurrently by multi thread, HRegion.memstoreSize will be calculated wrong. At the end of HRegion#internalFlushcache(), we will do this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is negative and prevent next flush if we close this region. Logs in RS for region e9d827913a056e696c39bc569ea3 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 128.0m 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, memsize=59.6m, filesize=31.2m 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 134.8m 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, memsize=68.5m, filesize=26.6m 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~128.1m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction requested=false 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for writetest1,,1331454657410.e9d827913a056e696c39bc569ea3 f99f., current region memstore size 6.8m 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, memsize=3.1m, filesize=1.6m 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, memsize=3.6m, filesize=1.4m 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~134.8m for region writetest1,,1331454657410.e9d827913a 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction requested=true 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, memsize=47.4k, filesize=25.6k 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, memsize=47.8k, filesize=19.3k 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~6.8m for region writetest1,,1331454657410.e9d827913a05 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction requested=true -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA,
[jira] [Commented] (HBASE-5593) Reverse DNS resolution in regionServerStartup() does not strip trailing dot
[ https://issues.apache.org/jira/browse/HBASE-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231551#comment-13231551 ] stack commented on HBASE-5593: -- @Ted You mean a unit test for the Strings.domainNamePointerToHostName facility? (We don't want to test InetSocketAddress). Reverse DNS resolution in regionServerStartup() does not strip trailing dot --- Key: HBASE-5593 URL: https://issues.apache.org/jira/browse/HBASE-5593 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: David S. Wang Assignee: David S. Wang Fix For: 0.90.7 Attachments: HBASE-5593.patch HBASE-4109 covered the removal of trailing dots in PTR records from reverse DNS lookups. We seem to have missed a case in HMaster#regionServerStartup(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5588) Deprecate/remove AssignmentManager#clearRegionFromTransition
[ https://issues.apache.org/jira/browse/HBASE-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231553#comment-13231553 ] stack commented on HBASE-5588: -- +1 on patch set Deprecate/remove AssignmentManager#clearRegionFromTransition Key: HBASE-5588 URL: https://issues.apache.org/jira/browse/HBASE-5588 Project: HBase Issue Type: Sub-task Components: hbck Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Attachments: hbase-5588-0.90.patch, hbase-5588-0.94.patch, hbase-5588.patch This method is essentially a dupe of Assignment#regionOffline. As suggested in early review of HBASE-5128 - deprecate up to 0.94 and remove from 0.96/trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-5592) Make it easier to get a table from shell
[ https://issues.apache.org/jira/browse/HBASE-5592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-5592. -- Resolution: Fixed Fix Version/s: 0.92.2 Hadoop Flags: Reviewed Committed trunk, 0.94, and 0.92 branches. Thanks for the patch Ben. Make it easier to get a table from shell Key: HBASE-5592 URL: https://issues.apache.org/jira/browse/HBASE-5592 Project: HBase Issue Type: Improvement Components: shell Affects Versions: 0.94.0 Reporter: Ben West Assignee: Ben West Priority: Trivial Labels: shell Fix For: 0.92.2, 0.94.0 Attachments: publicTable.patch The one argument constructor to HTable was removed at some point, which means that you now have to pass in a Configuration to instantiate an HTable. This is annoying for me when I create quick scripts. This JIRA is a tiny patch which lets you get an HTable instance in the shell by doing {code}foo_table = @shell.hbase_table('foo').table{code} Basically, it is changing table to be a public member rather than a private one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5563) HRegionInfo#compareTo add the comparison of regionId
[ https://issues.apache.org/jira/browse/HBASE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231557#comment-13231557 ] stack commented on HBASE-5563: -- +1 on patch for trunk and 0.92 (again). Older regionids should appear earlier in a sorted list than newer regionids as per this patch. HRegionInfo#compareTo add the comparison of regionId Key: HBASE-5563 URL: https://issues.apache.org/jira/browse/HBASE-5563 Project: HBase Issue Type: Bug Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0 Reporter: chunhui shen Assignee: chunhui shen Attachments: HBASE-5563.patch, HBASE-5563v2.patch, HBASE-5563v2.patch, hbase-5563-v3-0.92.patch, hbase-5563-v3.patch In the one region multi assigned case, we could find that two regions have the same table name, same startKey, same endKey, and different regionId, so these two regions are same in TreeMap but different in HashMap. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5578) NPE when regionserver reported server load, caused rs stop.
[ https://issues.apache.org/jira/browse/HBASE-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5578: - Attachment: 5589.txt How about this. Goes through Store and checks all Reader instances for null before using. We were doing this in half the cases already. Converts the NPE into a null warning. Means we don't crash. Puts off having to spend time on why the Reader is null at particular junctures. Should go into 0.94? NPE when regionserver reported server load, caused rs stop. --- Key: HBASE-5578 URL: https://issues.apache.org/jira/browse/HBASE-5578 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Environment: centos6.2 hadoop-1.0.0 hbase-0.92.0 Reporter: Storm Lee Priority: Critical Fix For: 0.92.2 Attachments: 5589.txt The regeionserver log: 2012-03-11 11:55:37,808 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server data3,60020,1331286604591: Unhandled exception: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.Store.getTotalStaticIndexSize(Store.java:1788) at org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:994) at org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:800) at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:776) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:678) at java.lang.Thread.run(Thread.java:662) 2012-03-11 11:55:37,808 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [] 2012-03-11 11:55:37,808 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: requestsPerSecond=1687, numberOfOnlineRegions=37, numberOfStores=37, numberOfStorefiles=144, storefileIndexSizeMB=2, rootIndexSizeKB=2362, totalStaticIndexSizeKB=229808, totalStaticBloomSizeKB=2166296, memstoreSizeMB=2854, readRequestsCount=1352673, writeRequestsCount=113137586, compactionQueueSize=8, flushQueueSize=3, usedHeapMB=7359, maxHeapMB=12999, blockCacheSizeMB=32.31, blockCacheFreeMB=3867.52, blockCacheCount=38, blockCacheHitCount=87713, blockCacheMissCount=22144560, blockCacheEvictedCount=122, blockCacheHitRatio=0%, blockCacheHitCachingRatio=99%, hdfsBlocksLocalityIndex=100 2012-03-11 11:55:37,992 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled exception: null -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5578) NPE when regionserver reported server load, caused rs stop.
[ https://issues.apache.org/jira/browse/HBASE-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5578: - Status: Patch Available (was: Open) NPE when regionserver reported server load, caused rs stop. --- Key: HBASE-5578 URL: https://issues.apache.org/jira/browse/HBASE-5578 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Environment: centos6.2 hadoop-1.0.0 hbase-0.92.0 Reporter: Storm Lee Priority: Critical Attachments: 5589.txt The regeionserver log: 2012-03-11 11:55:37,808 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server data3,60020,1331286604591: Unhandled exception: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.Store.getTotalStaticIndexSize(Store.java:1788) at org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:994) at org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:800) at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:776) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:678) at java.lang.Thread.run(Thread.java:662) 2012-03-11 11:55:37,808 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [] 2012-03-11 11:55:37,808 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: requestsPerSecond=1687, numberOfOnlineRegions=37, numberOfStores=37, numberOfStorefiles=144, storefileIndexSizeMB=2, rootIndexSizeKB=2362, totalStaticIndexSizeKB=229808, totalStaticBloomSizeKB=2166296, memstoreSizeMB=2854, readRequestsCount=1352673, writeRequestsCount=113137586, compactionQueueSize=8, flushQueueSize=3, usedHeapMB=7359, maxHeapMB=12999, blockCacheSizeMB=32.31, blockCacheFreeMB=3867.52, blockCacheCount=38, blockCacheHitCount=87713, blockCacheMissCount=22144560, blockCacheEvictedCount=122, blockCacheHitRatio=0%, blockCacheHitCachingRatio=99%, hdfsBlocksLocalityIndex=100 2012-03-11 11:55:37,992 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled exception: null -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5578) NPE when regionserver reported server load, caused rs stop.
[ https://issues.apache.org/jira/browse/HBASE-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5578: - Fix Version/s: 0.92.2 NPE when regionserver reported server load, caused rs stop. --- Key: HBASE-5578 URL: https://issues.apache.org/jira/browse/HBASE-5578 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Environment: centos6.2 hadoop-1.0.0 hbase-0.92.0 Reporter: Storm Lee Priority: Critical Fix For: 0.92.2 Attachments: 5589.txt The regeionserver log: 2012-03-11 11:55:37,808 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server data3,60020,1331286604591: Unhandled exception: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.Store.getTotalStaticIndexSize(Store.java:1788) at org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:994) at org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:800) at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:776) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:678) at java.lang.Thread.run(Thread.java:662) 2012-03-11 11:55:37,808 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [] 2012-03-11 11:55:37,808 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: requestsPerSecond=1687, numberOfOnlineRegions=37, numberOfStores=37, numberOfStorefiles=144, storefileIndexSizeMB=2, rootIndexSizeKB=2362, totalStaticIndexSizeKB=229808, totalStaticBloomSizeKB=2166296, memstoreSizeMB=2854, readRequestsCount=1352673, writeRequestsCount=113137586, compactionQueueSize=8, flushQueueSize=3, usedHeapMB=7359, maxHeapMB=12999, blockCacheSizeMB=32.31, blockCacheFreeMB=3867.52, blockCacheCount=38, blockCacheHitCount=87713, blockCacheMissCount=22144560, blockCacheEvictedCount=122, blockCacheHitRatio=0%, blockCacheHitCachingRatio=99%, hdfsBlocksLocalityIndex=100 2012-03-11 11:55:37,992 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled exception: null -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted
[ https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231579#comment-13231579 ] Hudson commented on HBASE-5155: --- Integrated in HBase-TRUNK #2685 (See [https://builds.apache.org/job/HBase-TRUNK/2685/]) HBASE-5206 Port HBASE-5155 to trunk (Ashutosh Jindal) (Revision 1301709) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted --- Key: HBASE-5155 URL: https://issues.apache.org/jira/browse/HBASE-5155 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.90.6 Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch ServerShutDownHandler and disable/delete table handler races. This is not an issue due to TM. - A regionserver goes down. In our cluster the regionserver holds lot of regions. - A region R1 has two daughters D1 and D2. - The ServerShutdownHandler gets called and scans the META and gets all the user regions - Parallely a table is disabled. (No problem in this step). - Delete table is done. - The tables and its regions are deleted including R1, D1 and D2.. (So META is cleaned) - Now ServerShutdownhandler starts to processTheDeadRegion {code} if (hri.isOffline() hri.isSplit()) { LOG.debug(Offlined and split region + hri.getRegionNameAsString() + ; checking daughter presence); fixupDaughters(result, assignmentManager, catalogTracker); {code} As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 {code} if (isDaughterMissing(catalogTracker, daughter)) { LOG.info(Fixup; missing daughter + daughter.getRegionNameAsString()); MetaEditor.addDaughter(catalogTracker, daughter, null); // TODO: Log WARN if the regiondir does not exist in the fs. If its not // there then something wonky about the split -- things will keep going // but could be missing references to parent region. // And assign it. assignmentManager.assign(daughter, true); {code} we call assign of the daughers. Now after this we again start with the below code. {code} if (processDeadRegion(e.getKey(), e.getValue(), this.services.getAssignmentManager(), this.server.getCatalogTracker())) { this.services.getAssignmentManager().assign(e.getKey(), true); {code} Now when the SSH scanned the META it had R1, D1 and D2. So as part of the above code D1 and D2 which where assigned by fixUpDaughters is again assigned by {code} this.services.getAssignmentManager().assign(e.getKey(), true); {code} Thus leading to a zookeeper issue due to bad version and killing the master. The important part here is the regions that were deleted are recreated which i think is more critical. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK
[ https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231577#comment-13231577 ] Hudson commented on HBASE-5206: --- Integrated in HBase-TRUNK #2685 (See [https://builds.apache.org/job/HBase-TRUNK/2685/]) HBASE-5206 Port HBASE-5155 to trunk (Ashutosh Jindal) (Revision 1301709) Result = SUCCESS tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java Port HBASE-5155 to 0.92, 0.94, and TRUNK Key: HBASE-5206 URL: https://issues.apache.org/jira/browse/HBASE-5206 Project: HBase Issue Type: Bug Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Zhihong Yu Assignee: Ashutosh Jindal Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 5206_trunk_latest_3.patch This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted) to 0.92 and TRUNK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5578) NPE when regionserver reported server load, caused rs stop.
[ https://issues.apache.org/jira/browse/HBASE-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231589#comment-13231589 ] Hadoop QA commented on HBASE-5578: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518726/5589.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 162 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestKeepDeletes org.apache.hadoop.hbase.regionserver.TestMinVersions org.apache.hadoop.hbase.regionserver.TestCompaction Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1209//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1209//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1209//console This message is automatically generated. NPE when regionserver reported server load, caused rs stop. --- Key: HBASE-5578 URL: https://issues.apache.org/jira/browse/HBASE-5578 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.0 Environment: centos6.2 hadoop-1.0.0 hbase-0.92.0 Reporter: Storm Lee Priority: Critical Fix For: 0.92.2 Attachments: 5589.txt The regeionserver log: 2012-03-11 11:55:37,808 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server data3,60020,1331286604591: Unhandled exception: null java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.Store.getTotalStaticIndexSize(Store.java:1788) at org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:994) at org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:800) at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:776) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:678) at java.lang.Thread.run(Thread.java:662) 2012-03-11 11:55:37,808 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [] 2012-03-11 11:55:37,808 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: requestsPerSecond=1687, numberOfOnlineRegions=37, numberOfStores=37, numberOfStorefiles=144, storefileIndexSizeMB=2, rootIndexSizeKB=2362, totalStaticIndexSizeKB=229808, totalStaticBloomSizeKB=2166296, memstoreSizeMB=2854, readRequestsCount=1352673, writeRequestsCount=113137586, compactionQueueSize=8, flushQueueSize=3, usedHeapMB=7359, maxHeapMB=12999, blockCacheSizeMB=32.31, blockCacheFreeMB=3867.52, blockCacheCount=38, blockCacheHitCount=87713, blockCacheMissCount=22144560, blockCacheEvictedCount=122, blockCacheHitRatio=0%, blockCacheHitCachingRatio=99%, hdfsBlocksLocalityIndex=100 2012-03-11 11:55:37,992 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled exception: null -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5549: --- Status: Open (was: Patch Available) Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v10.patch, 5549.v11.patch, 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231598#comment-13231598 ] nkeywal commented on HBASE-5549: v11 with the comments taken into account... Thank you for the review. Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v10.patch, 5549.v11.patch, 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-5549: --- Attachment: 5549.v11.patch Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v10.patch, 5549.v11.patch, 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5581) Creating a table with invalid syntax does not give an error message when it fails
[ https://issues.apache.org/jira/browse/HBASE-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231600#comment-13231600 ] Mikhail Bautin commented on HBASE-5581: --- Binu: thanks for the patch! Stack: thanks for committing! Creating a table with invalid syntax does not give an error message when it fails - Key: HBASE-5581 URL: https://issues.apache.org/jira/browse/HBASE-5581 Project: HBase Issue Type: Bug Components: shell Reporter: Binu John Priority: Minor Fix For: 0.94.0, 0.96.0 Attachments: 5581trunk.patch, D2343.1.patch Creating a table with invalid syntax does not give an error message when it fails. In this case, it doesn't actually create the CF requested, but doesn't give any indication to the user that it failed. create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, NUMREGIONS = 20, SPLITALGO = HexStringSplit, COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'} 0 row(s) in 3.0930 seconds hbase(main):002:0 describe 'test' DESCRIPTION ENABLED {NAME = 'test', FAMILIES = []} true 1 row(s) in 0.1430 seconds Putting {NUMREGIONS = 20, SPLITALGO = HexStringSplit} into a separate stanza works fine, so the feature is fine. create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'}, {NUMREGIONS = 20, SPLITALGO = HexStringSplit} 0 row(s) in 2.7860 seconds hbase(main):002:0 describe 'test' DESCRIPTION ENABLED {NAME = 'test', FAMILIES = [{NAME = 'test', DATA_BLOCK_ENCODING = 'NONE', true BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', COMPRESSION = 'LZO', VERSIONS = '1', TTL = '2147483647', BLOCKSIZE = '65536', BLOOMFILTER_ERRORRATE = ' 0.01', ENCODE_ON_DISK = 'true', IN_MEMORY = 'false', BLOCKCACHE = 'true'}]} We should throw an error if we can't create the CF so it's clear to the user. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5575) Configure Arcanist lint engine for HBase
[ https://issues.apache.org/jira/browse/HBASE-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-5575: -- Attachment: Enabling-lint-2012-03-16_13_40_37.patch Configure Arcanist lint engine for HBase Key: HBASE-5575 URL: https://issues.apache.org/jira/browse/HBASE-5575 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: Enabling-lint-2012-03-16_13_40_37.patch We need to enable Arcanist lint engine in HBase, so that a commit could be checked by running arc lint. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5575) Configure Arcanist lint engine for HBase
[ https://issues.apache.org/jira/browse/HBASE-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231604#comment-13231604 ] Mikhail Bautin commented on HBASE-5575: --- Reviewed at https://reviews.facebook.net/D2289. Configure Arcanist lint engine for HBase Key: HBASE-5575 URL: https://issues.apache.org/jira/browse/HBASE-5575 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: Enabling-lint-2012-03-16_13_40_37.patch We need to enable Arcanist lint engine in HBase, so that a commit could be checked by running arc lint. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5575) Configure Arcanist lint engine for HBase
[ https://issues.apache.org/jira/browse/HBASE-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231613#comment-13231613 ] Phabricator commented on HBASE-5575: mbautin has committed the revision [jira] [HBASE-5575] Configure Arcanist lint engine for HBase. REVISION DETAIL https://reviews.facebook.net/D2289 COMMIT https://reviews.facebook.net/rHBASE1301751 Configure Arcanist lint engine for HBase Key: HBASE-5575 URL: https://issues.apache.org/jira/browse/HBASE-5575 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin Attachments: Enabling-lint-2012-03-16_13_40_37.patch We need to enable Arcanist lint engine in HBase, so that a commit could be checked by running arc lint. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK
[ https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231619#comment-13231619 ] Zhihong Yu commented on HBASE-5206: --- Integrated to 0.94 as well. Port HBASE-5155 to 0.92, 0.94, and TRUNK Key: HBASE-5206 URL: https://issues.apache.org/jira/browse/HBASE-5206 Project: HBase Issue Type: Bug Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Zhihong Yu Assignee: Ashutosh Jindal Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 5206_trunk_latest_3.patch This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted) to 0.92 and TRUNK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK
[ https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231636#comment-13231636 ] Hudson commented on HBASE-5206: --- Integrated in HBase-0.94 #36 (See [https://builds.apache.org/job/HBase-0.94/36/]) HBASE-5206 port HBASE-5155 to 0.94 (Ashutosh Jindal) (Revision 1301737) Result = SUCCESS tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java Port HBASE-5155 to 0.92, 0.94, and TRUNK Key: HBASE-5206 URL: https://issues.apache.org/jira/browse/HBASE-5206 Project: HBase Issue Type: Bug Affects Versions: 0.92.2, 0.94.0, 0.96.0 Reporter: Zhihong Yu Assignee: Ashutosh Jindal Fix For: 0.92.2, 0.94.0, 0.96.0 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 5206_trunk_latest_3.patch This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted) to 0.92 and TRUNK -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted
[ https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231637#comment-13231637 ] Hudson commented on HBASE-5155: --- Integrated in HBase-0.94 #36 (See [https://builds.apache.org/job/HBase-0.94/36/]) HBASE-5206 port HBASE-5155 to 0.94 (Ashutosh Jindal) (Revision 1301737) Result = SUCCESS tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted --- Key: HBASE-5155 URL: https://issues.apache.org/jira/browse/HBASE-5155 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Blocker Fix For: 0.90.6 Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch ServerShutDownHandler and disable/delete table handler races. This is not an issue due to TM. - A regionserver goes down. In our cluster the regionserver holds lot of regions. - A region R1 has two daughters D1 and D2. - The ServerShutdownHandler gets called and scans the META and gets all the user regions - Parallely a table is disabled. (No problem in this step). - Delete table is done. - The tables and its regions are deleted including R1, D1 and D2.. (So META is cleaned) - Now ServerShutdownhandler starts to processTheDeadRegion {code} if (hri.isOffline() hri.isSplit()) { LOG.debug(Offlined and split region + hri.getRegionNameAsString() + ; checking daughter presence); fixupDaughters(result, assignmentManager, catalogTracker); {code} As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 {code} if (isDaughterMissing(catalogTracker, daughter)) { LOG.info(Fixup; missing daughter + daughter.getRegionNameAsString()); MetaEditor.addDaughter(catalogTracker, daughter, null); // TODO: Log WARN if the regiondir does not exist in the fs. If its not // there then something wonky about the split -- things will keep going // but could be missing references to parent region. // And assign it. assignmentManager.assign(daughter, true); {code} we call assign of the daughers. Now after this we again start with the below code. {code} if (processDeadRegion(e.getKey(), e.getValue(), this.services.getAssignmentManager(), this.server.getCatalogTracker())) { this.services.getAssignmentManager().assign(e.getKey(), true); {code} Now when the SSH scanned the META it had R1, D1 and D2. So as part of the above code D1 and D2 which where assigned by fixUpDaughters is again assigned by {code} this.services.getAssignmentManager().assign(e.getKey(), true); {code} Thus leading to a zookeeper issue due to bad version and killing the master. The important part here is the regions that were deleted are recreated which i think is more critical. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5521) Move compression/decompression to an encoder specific encoding context
[ https://issues.apache.org/jira/browse/HBASE-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231644#comment-13231644 ] Phabricator commented on HBASE-5521: mbautin has commented on the revision HBASE-5521 [jira] Move compression/decompression to an encoder specific encoding context. Yongqiang: we now have a linter available in HBase trunk. Could you please run arc lint, resolve lint warnings, and resubmit the diff with arc diff --preview? REVISION DETAIL https://reviews.facebook.net/D2097 Move compression/decompression to an encoder specific encoding context -- Key: HBASE-5521 URL: https://issues.apache.org/jira/browse/HBASE-5521 Project: HBase Issue Type: Improvement Reporter: He Yongqiang Assignee: He Yongqiang Attachments: HBASE-5521.1.patch, HBASE-5521.D2097.1.patch, HBASE-5521.D2097.2.patch, HBASE-5521.D2097.3.patch, HBASE-5521.D2097.4.patch, HBASE-5521.D2097.5.patch, HBASE-5521.D2097.6.patch As part of working on HBASE-5313, we want to add a new columnar encoder/decoder. It makes sense to move compression to be part of encoder/decoder: 1) a scanner for a columnar encoded block can do lazy decompression to a specific part of a key value object 2) avoid an extra bytes copy from encoder to hblock-writer. If there is no encoder specified for a writer, the HBlock.Writer will use a default compression-context to do something very similar to today's code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231646#comment-13231646 ] Hadoop QA commented on HBASE-5549: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12518730/5549.v11.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 20 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1210//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1210//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1210//console This message is automatically generated. Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: 5549.v10.patch, 5549.v11.patch, 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5549: -- Fix Version/s: 0.96.0 Hadoop Flags: Reviewed Integrated to TRUNK. Thanks for the patch, N. Thanks for the review, Stack. Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5549.v10.patch, 5549.v11.patch, 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires
[ https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5549: -- Resolution: Fixed Status: Resolved (was: Patch Available) Master can fail if ZooKeeper session expires Key: HBASE-5549 URL: https://issues.apache.org/jira/browse/HBASE-5549 Project: HBase Issue Type: Bug Components: master, zookeeper Affects Versions: 0.96.0 Environment: all Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5549.v10.patch, 5549.v11.patch, 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch There is a retry mechanism in RecoverableZooKeeper, but when the session expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism does not work in this case. This is why a sleep is needed in TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher to be recreated before using the connection. This can happen in real life, it can happen when: - master zookeeper starts - zookeeper connection is cut - master enters the retry loop - in the meantime the session expires - the network comes back, the session is recreated - the retries continues, but on the wrong object, hence fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master
[ https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhihong Yu updated HBASE-5572: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Resolved as part of HBASE-5549 KeeperException.SessionExpiredException management could be improved in Master -- Key: HBASE-5572 URL: https://issues.apache.org/jira/browse/HBASE-5572 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.96.0 Attachments: 5572.v1.patch, 5572.v2.patch, 5572.v2.patch, 5572.v2.patch Synthesis: 1) TestMasterZKSessionRecovery distinguish two cases on SessionExpiredException. One is explicitly not managed. However, is seems that there is no reason for this. 2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a quite complex function, with a useless recursive call. 3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is equivalent to TestZooKeeper#testMasterSessionExpired 4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be removed if we merge the two cases mentioned above. Changes are: 2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a single case and remove recursion 1) Removing TestMasterZKSessionRecovery Detailed justification: testMasterZKSessionRecoveryFailure says: {noformat} /** * Negative test of master recovery from zk session expiry. * * Starts with one master. Fakes the master zk session expired. * Ensures the master cannot recover the expired zk session since * the master zk node is still there. */ public void testMasterZKSessionRecoveryFailure() throws Exception { MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster(); HMaster m = cluster.getMaster(); m.abort(Test recovery from zk session expired, new KeeperException.SessionExpiredException()); assertTrue(m.isStopped()); } {noformat} This tests works, i.e. the assertion is always verified. But do we really want this behavior? When looking at the code, we see that this what's happening is strange: - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false HMaster#abort stops the master. - HMaster#abortNow checks the exception type. As it's a SessionExpiredException it will try to recover, calling HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false (and that will make HMaster#abort stopping the master) - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and then try to become the active master. If it cannot, it will return false (and that will make HMaster#abort stopping the master). - HMaster#becomeActiveMaster returns the result of ActiveMasterManager#blockUntilBecomingActiveMaster. blockUntilBecomingActiveMaster says it will return false if there is any error preventing it to become the active master. - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master address. If it's the same port host, it deletes the nodes, that will start a recursive call to blockUntilBecomingActiveMaster. This second call succeeds (we became the active master) and return true. This result is ignored by the first blockUntilBecomingActiveMaster: it return false (even if we actually became the active master), hence the whole suite call returns false and HMaster#abort stops the master. In other words, the comment says Ensures the master cannot recover the expired zk session since the master zk node is still there. but we're actually doing a check just for this and deleting the node. If we were not ignoring the result, we would return true, so we would not stop the master, so the test would fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira