[jira] [Updated] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss

2012-03-16 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5568:


Attachment: HBASE-5568v2.patch

 Multi concurrent flushcache() for one region could cause data loss
 --

 Key: HBASE-5568
 URL: https://issues.apache.org/jira/browse/HBASE-5568
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, 
 HBASE-5568v2.patch, HBASE-5568v2.patch


 We could call HRegion#flushcache() concurrently now through 
 HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin.
 However, we find if HRegion#internalFlushcache() is called concurrently by 
 multi thread, HRegion.memstoreSize will be calculated wrong.
 At the end of HRegion#internalFlushcache(), we will do 
 this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the 
 actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is 
 negative and prevent next flush if we close this region.
 Logs in RS for region e9d827913a056e696c39bc569ea3
 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 128.0m
 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, 
 memsize=59.6m, filesize=31.2m
 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 134.8m
 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, 
 memsize=68.5m, filesize=26.6m
 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~128.1m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction 
 requested=false
 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 6.8m
 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, 
 memsize=3.1m, filesize=1.6m
 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, 
 memsize=3.6m, filesize=1.4m
 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~134.8m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction 
 requested=true
 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, 
 memsize=47.4k, filesize=25.6k
 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, 
 memsize=47.8k, filesize=19.3k
 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~6.8m for region 
 writetest1,,1331454657410.e9d827913a05
 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction 
 requested=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss

2012-03-16 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230940#comment-13230940
 ] 

Hadoop QA commented on HBASE-5568:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518621/HBASE-5568v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 161 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1198//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1198//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1198//console

This message is automatically generated.

 Multi concurrent flushcache() for one region could cause data loss
 --

 Key: HBASE-5568
 URL: https://issues.apache.org/jira/browse/HBASE-5568
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, 
 HBASE-5568v2.patch, HBASE-5568v2.patch


 We could call HRegion#flushcache() concurrently now through 
 HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin.
 However, we find if HRegion#internalFlushcache() is called concurrently by 
 multi thread, HRegion.memstoreSize will be calculated wrong.
 At the end of HRegion#internalFlushcache(), we will do 
 this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the 
 actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is 
 negative and prevent next flush if we close this region.
 Logs in RS for region e9d827913a056e696c39bc569ea3
 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 128.0m
 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, 
 memsize=59.6m, filesize=31.2m
 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 134.8m
 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, 
 memsize=68.5m, filesize=26.6m
 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~128.1m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction 
 requested=false
 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 6.8m
 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, 
 memsize=3.1m, filesize=1.6m
 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, 
 memsize=3.6m, filesize=1.4m
 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~134.8m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction 
 requested=true
 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 

[jira] [Commented] (HBASE-4269) Add tests and restore semantics to TableInputFormat/TableRecordReader

2012-03-16 Thread Jan Lukavsky (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230972#comment-13230972
 ] 

Jan Lukavsky commented on HBASE-4269:
-

Hi,

I think patch to this issue changed semantics for mapreduce API. In HBASE-4196 
there was no change in semantics in 
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl, the only change was in 
org.apache.hadoop.hbase.mapred.TableRecordReaderImpl (where the catch of 
UnknownScannerException was changed to IOException). Now the semantics of 
mapreduce API is different of the one before HBASE-4196, and I think this 
should be reverted. Is there any reason why to have different semantics for the 
two APIs? Wouldn't it be better to accept the change of semantics in 
HBASE-4196? Are there any negative side-effects of this change? I don't see any 
discussion of the type do we need to change the semantics back?

Thanks for reply :)

 Jan

 Add tests and restore semantics to TableInputFormat/TableRecordReader
 -

 Key: HBASE-4269
 URL: https://issues.apache.org/jira/browse/HBASE-4269
 Project: HBase
  Issue Type: Improvement
  Components: mapred, mapreduce, test
Affects Versions: 0.90.5, 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.90.5

 Attachments: 
 0001-HBASE-4269-Add-tests-and-restore-semantics-to-TableI.patch, 
 0001-HBASE-4269-Add-tests-and-restore-semantics-to-TableI.patch


 HBASE-4196 Modified the semantics of failures in 
 TableImportFormat/TableRecordReader, and had no tests cases.  This patch 
 restores semantics to rethrow when a DoNotRetryIOException is triggered and 
 adds test cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5549:
---

Status: Open  (was: Patch Available)

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v6.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5549:
---

Attachment: nochange.patch

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v6.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5549:
---

Status: Patch Available  (was: Open)

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v6.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4914) Enhance MapReduce TableInputFormat to Support N-mappers per Region

2012-03-16 Thread Cosmin Lehene (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231008#comment-13231008
 ] 

Cosmin Lehene commented on HBASE-4914:
--

Hadoop 0.20 doesn't behave well with large number of map tasks, so we 
implemented a N-Regions per map (through a splits_per_map property). 

I guess ideally we should be able to specify a min/max number of map tasks as 
well and have these two happen implicitly, perhaps with some sane thresholds.

 Enhance MapReduce TableInputFormat to Support N-mappers per Region
 --

 Key: HBASE-4914
 URL: https://issues.apache.org/jira/browse/HBASE-4914
 Project: HBase
  Issue Type: Sub-task
  Components: client, regionserver
Reporter: Nicolas Spiegelberg
Priority: Blocker
 Fix For: 0.94.0


 Current TableInputFormat based MR jobs create exactly one mapper per region 
 where each mapper sets one Scan with appropriate start/stop row keys. This 
 change allows jobs to be run with any number of mappers per region, so that 
 when a mapper fails, there will be less data to be reprocessed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4608) HLog Compression

2012-03-16 Thread Lars Francke (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231009#comment-13231009
 ] 

Lars Francke commented on HBASE-4608:
-

This seems to be missing documentation, no?

Shouldn't the hbase.regionserver.wal.enablecompression key at least be in 
hbase-default.xml?

 HLog Compression
 

 Key: HBASE-4608
 URL: https://issues.apache.org/jira/browse/HBASE-4608
 Project: HBase
  Issue Type: New Feature
Reporter: Li Pi
Assignee: Li Pi
 Fix For: 0.94.0

 Attachments: 4608-v19.txt, 4608-v20.txt, 4608-v22.txt, 4608v1.txt, 
 4608v13.txt, 4608v13.txt, 4608v14.txt, 4608v15.txt, 4608v16.txt, 4608v17.txt, 
 4608v18.txt, 4608v23.txt, 4608v24.txt, 4608v25.txt, 4608v27.txt, 4608v29.txt, 
 4608v30.txt, 4608v5.txt, 4608v6.txt, 4608v7.txt, 4608v8fixed.txt, 
 hbase-4608-v28-delta.txt, hbase-4608-v28.txt, hbase-4608-v28.txt


 The current bottleneck to HBase write speed is replicating the WAL appends 
 across different datanodes. We can speed up this process by compressing the 
 HLog. Current plan involves using a dictionary to compress table name, region 
 id, cf name, and possibly other bits of repeated data. Also, HLog format may 
 be changed in other ways to produce a smaller HLog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231029#comment-13231029
 ] 

Hadoop QA commented on HBASE-5549:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518639/nochange.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 161 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1199//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1199//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1199//console

This message is automatically generated.

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v6.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5549:
---

Status: Open  (was: Patch Available)

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v6.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5549:
---

Status: Patch Available  (was: Open)

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v6.patch, 5549.v7.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5549:
---

Attachment: 5549.v7.patch

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v6.patch, 5549.v7.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231159#comment-13231159
 ] 

Hadoop QA commented on HBASE-5549:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518660/5549.v7.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 20 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 161 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestMultiParallel
  org.apache.hadoop.hbase.TestDrainingServer
  org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  org.apache.hadoop.hbase.master.TestSplitLogManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1200//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1200//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1200//console

This message is automatically generated.

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v6.patch, 5549.v7.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5549:
---

Status: Patch Available  (was: Open)

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 
 nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5549:
---

Status: Open  (was: Patch Available)

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 
 nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5549:
---

Attachment: 5549.v8.patch

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 
 nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5549:
---

Status: Open  (was: Patch Available)

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 
 nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5549:
---

Attachment: 5549.v9.patch

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 
 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5549:
---

Status: Patch Available  (was: Open)

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 
 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5563) HRegionInfo#compareTo add the comparison of regionId

2012-03-16 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231187#comment-13231187
 ] 

Jonathan Hsieh commented on HBASE-5563:
---

@Chunhui

I get it -- we just didn't see the robot run the v2 to show that it fixed the 
problem.

Are you ok with the newer patch (and test?) that fixes getRegionsOfTable?  I 
think it is more intuitive since older HRI's with smaller datestamp/regionIds 
are smaller than newer HRI's with larger datestamp/regionIds.  

It's helpful to add comments and tests to show what you intend -- hopefully the 
updated patch I provided makes it clear.

 HRegionInfo#compareTo add the comparison of regionId
 

 Key: HBASE-5563
 URL: https://issues.apache.org/jira/browse/HBASE-5563
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-5563.patch, HBASE-5563v2.patch, 
 HBASE-5563v2.patch, hbase-5563-v3-0.92.patch, hbase-5563-v3.patch


 In the one region multi assigned case,  we could find that two regions have 
 the same table name, same startKey, same endKey, and different regionId, so 
 these two regions are same in TreeMap but different in HashMap.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5578) NPE when regionserver reported server load, caused rs stop.

2012-03-16 Thread Storm Lee (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231204#comment-13231204
 ] 

Storm Lee commented on HBASE-5578:
--

I meet it only once, may not reproduce easily. My Hbase cluster has 9 RSes, 
only one crushed. It is fresh start up , nothing under /hbase. And than I use a 
tool to put data all the time(use HTable.put()).  This RS runs about 45 hours 
already when the NPE happened. The compaction and split  also continued all the 
time.

 NPE when regionserver reported server load, caused rs stop.
 ---

 Key: HBASE-5578
 URL: https://issues.apache.org/jira/browse/HBASE-5578
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
 Environment: centos6.2 hadoop-1.0.0 hbase-0.92.0
Reporter: Storm Lee
Priority: Critical

 The regeionserver log:
 2012-03-11 11:55:37,808 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 data3,60020,1331286604591: Unhandled exception: null
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.Store.getTotalStaticIndexSize(Store.java:1788)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:994)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:800)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:776)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:678)
   at java.lang.Thread.run(Thread.java:662)
 2012-03-11 11:55:37,808 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: 
 loaded coprocessors are: []
 2012-03-11 11:55:37,808 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: 
 requestsPerSecond=1687, numberOfOnlineRegions=37, numberOfStores=37, 
 numberOfStorefiles=144, storefileIndexSizeMB=2, rootIndexSizeKB=2362, 
 totalStaticIndexSizeKB=229808, totalStaticBloomSizeKB=2166296, 
 memstoreSizeMB=2854, readRequestsCount=1352673, writeRequestsCount=113137586, 
 compactionQueueSize=8, flushQueueSize=3, usedHeapMB=7359, maxHeapMB=12999, 
 blockCacheSizeMB=32.31, blockCacheFreeMB=3867.52, blockCacheCount=38, 
 blockCacheHitCount=87713, blockCacheMissCount=22144560, 
 blockCacheEvictedCount=122, blockCacheHitRatio=0%, 
 blockCacheHitCachingRatio=99%, hdfsBlocksLocalityIndex=100
 2012-03-11 11:55:37,992 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled 
 exception: null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231217#comment-13231217
 ] 

Hadoop QA commented on HBASE-5549:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518664/5549.v8.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 20 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 161 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestReplication
  org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  org.apache.hadoop.hbase.TestZooKeeper

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1201//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1201//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1201//console

This message is automatically generated.

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 
 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK

2012-03-16 Thread Ashutosh Jindal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Jindal updated HBASE-5206:
---

Attachment: 5206_trunk_latest_2.patch

 Port HBASE-5155 to 0.92, 0.94, and TRUNK
 

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Zhihong Yu
Assignee: Ashutosh Jindal
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk-v3.patch, 
 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5563) HRegionInfo#compareTo add the comparison of regionId

2012-03-16 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231236#comment-13231236
 ] 

Jonathan Hsieh commented on HBASE-5563:
---

Tests came back clean or came back with failures that were intermittent but 
clean when run locally on all version.  

 HRegionInfo#compareTo add the comparison of regionId
 

 Key: HBASE-5563
 URL: https://issues.apache.org/jira/browse/HBASE-5563
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-5563.patch, HBASE-5563v2.patch, 
 HBASE-5563v2.patch, hbase-5563-v3-0.92.patch, hbase-5563-v3.patch


 In the one region multi assigned case,  we could find that two regions have 
 the same table name, same startKey, same endKey, and different regionId, so 
 these two regions are same in TreeMap but different in HashMap.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5549:
---

Status: Open  (was: Patch Available)

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 
 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231242#comment-13231242
 ] 

Hadoop QA commented on HBASE-5549:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518665/5549.v9.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 20 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 161 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  org.apache.hadoop.hbase.TestZooKeeper
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestImportTsv

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1202//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1202//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1202//console

This message is automatically generated.

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v6.patch, 5549.v7.patch, 5549.v8.patch, 
 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5549:
---

Status: Patch Available  (was: Open)

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v10.patch, 5549.v6.patch, 5549.v7.patch, 
 5549.v8.patch, 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5589) Add of the offline call to the Master Interface

2012-03-16 Thread Jonathan Hsieh (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh reassigned HBASE-5589:
-

Assignee: Jonathan Hsieh

 Add of the offline call to the Master Interface
 ---

 Key: HBASE-5589
 URL: https://issues.apache.org/jira/browse/HBASE-5589
 Project: HBase
  Issue Type: Sub-task
  Components: hbck
Affects Versions: 0.90.6, 0.92.0, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh

 Hbck from HBASE-5128 requires an offline method on the master to properly 
 cleanup state during certain assignment repair operations.  This will this 
 method will be added to recent and older versions of HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-5588) Deprecate/remove AssignmentManager#clearRegionFromTransition

2012-03-16 Thread Jonathan Hsieh (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh reassigned HBASE-5588:
-

Assignee: Jonathan Hsieh

 Deprecate/remove AssignmentManager#clearRegionFromTransition
 

 Key: HBASE-5588
 URL: https://issues.apache.org/jira/browse/HBASE-5588
 Project: HBase
  Issue Type: Sub-task
  Components: hbck
Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh

 This method is essentially a dupe of Assignment#regionOffline.  As suggested 
 in early review of HBASE-5128 - deprecate up to 0.94 and remove from 
 0.96/trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5592) Make it easier to get a table from shell

2012-03-16 Thread Ben West (Created) (JIRA)
Make it easier to get a table from shell


 Key: HBASE-5592
 URL: https://issues.apache.org/jira/browse/HBASE-5592
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 0.94.0
Reporter: Ben West
Assignee: Ben West
Priority: Trivial
 Fix For: 0.94.0
 Attachments: publicTable.patch

The one argument constructor to HTable was removed at some point, which means 
that you now have to pass in a Configuration to instantiate an HTable. This is 
annoying for me when I create quick scripts.

This JIRA is a tiny patch which lets you get an HTable instance in the shell by 
doing
{code}foo_table = @shell.hbase_table('foo').table{code}

Basically, it is changing table to be a public member rather than a private one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK

2012-03-16 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231284#comment-13231284
 ] 

Hadoop QA commented on HBASE-5206:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12518673/5206_trunk_latest_2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 18 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 161 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  org.apache.hadoop.hbase.TestZooKeeper

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1203//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1203//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1203//console

This message is automatically generated.

 Port HBASE-5155 to 0.92, 0.94, and TRUNK
 

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Zhihong Yu
Assignee: Ashutosh Jindal
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk-v3.patch, 
 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231299#comment-13231299
 ] 

Hadoop QA commented on HBASE-5549:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518676/5549.v10.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 20 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 161 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1204//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1204//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1204//console

This message is automatically generated.

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v10.patch, 5549.v6.patch, 5549.v7.patch, 
 5549.v8.patch, 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK

2012-03-16 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231306#comment-13231306
 ] 

Zhihong Yu commented on HBASE-5206:
---

Recent result is better. Good progress.
{code}
+   * @return True if the table is present
{code}
Please change 'True' to 'true'
{code}
+while (!ZKTable.isEnabledTable(zkw, testMasterAdmin)) {
+  Thread.sleep(100);
+}
{code}
Please shorten sleep interval to 10. A timeout is desirable for the wait.


 Port HBASE-5155 to 0.92, 0.94, and TRUNK
 

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Zhihong Yu
Assignee: Ashutosh Jindal
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk-v3.patch, 
 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231305#comment-13231305
 ] 

nkeywal commented on HBASE-5549:


Can be committed. Hopefully this is the end of the ZooKeeper expiry flakiness.


 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v10.patch, 5549.v6.patch, 5549.v7.patch, 
 5549.v8.patch, 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5593) Reverse DNS resolution in regionServerStartup() does not strip trailing dot

2012-03-16 Thread David S. Wang (Created) (JIRA)
Reverse DNS resolution in regionServerStartup() does not strip trailing dot
---

 Key: HBASE-5593
 URL: https://issues.apache.org/jira/browse/HBASE-5593
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.5
Reporter: David S. Wang
Assignee: David S. Wang
 Fix For: 0.90.7


HBASE-4109 covered the removal of trailing dots in PTR records from reverse DNS 
lookups.  We seem to have missed a case in HMaster#regionServerStartup().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss

2012-03-16 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5568:
--

Attachment: (was: HBASE-5568v2.patch)

 Multi concurrent flushcache() for one region could cause data loss
 --

 Key: HBASE-5568
 URL: https://issues.apache.org/jira/browse/HBASE-5568
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, 
 HBASE-5568v2.patch


 We could call HRegion#flushcache() concurrently now through 
 HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin.
 However, we find if HRegion#internalFlushcache() is called concurrently by 
 multi thread, HRegion.memstoreSize will be calculated wrong.
 At the end of HRegion#internalFlushcache(), we will do 
 this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the 
 actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is 
 negative and prevent next flush if we close this region.
 Logs in RS for region e9d827913a056e696c39bc569ea3
 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 128.0m
 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, 
 memsize=59.6m, filesize=31.2m
 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 134.8m
 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, 
 memsize=68.5m, filesize=26.6m
 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~128.1m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction 
 requested=false
 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 6.8m
 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, 
 memsize=3.1m, filesize=1.6m
 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, 
 memsize=3.6m, filesize=1.4m
 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~134.8m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction 
 requested=true
 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, 
 memsize=47.4k, filesize=25.6k
 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, 
 memsize=47.8k, filesize=19.3k
 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~6.8m for region 
 writetest1,,1331454657410.e9d827913a05
 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction 
 requested=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss

2012-03-16 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231318#comment-13231318
 ] 

Zhihong Yu commented on HBASE-5568:
---

Patch v2 looks good.
Will integrate if there is no objection.

 Multi concurrent flushcache() for one region could cause data loss
 --

 Key: HBASE-5568
 URL: https://issues.apache.org/jira/browse/HBASE-5568
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, 
 HBASE-5568v2.patch


 We could call HRegion#flushcache() concurrently now through 
 HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin.
 However, we find if HRegion#internalFlushcache() is called concurrently by 
 multi thread, HRegion.memstoreSize will be calculated wrong.
 At the end of HRegion#internalFlushcache(), we will do 
 this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the 
 actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is 
 negative and prevent next flush if we close this region.
 Logs in RS for region e9d827913a056e696c39bc569ea3
 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 128.0m
 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, 
 memsize=59.6m, filesize=31.2m
 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 134.8m
 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, 
 memsize=68.5m, filesize=26.6m
 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~128.1m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction 
 requested=false
 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 6.8m
 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, 
 memsize=3.1m, filesize=1.6m
 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, 
 memsize=3.6m, filesize=1.4m
 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~134.8m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction 
 requested=true
 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, 
 memsize=47.4k, filesize=25.6k
 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, 
 memsize=47.8k, filesize=19.3k
 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~6.8m for region 
 writetest1,,1331454657410.e9d827913a05
 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction 
 requested=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5490) Move the enum RS_ZK_REGION_FAILED_OPEN to the last of the enum list in 0.90 EventHandler

2012-03-16 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5490:
-

Fix Version/s: (was: 0.90.6)
   0.90.7

 Move the enum RS_ZK_REGION_FAILED_OPEN to the last of the enum list in 0.90 
 EventHandler
 

 Key: HBASE-5490
 URL: https://issues.apache.org/jira/browse/HBASE-5490
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.90.7

 Attachments: 5490-v2.txt, HBASE-5490.patch


 The new state that was added  RS_ZK_REGION_FAILED_OPEN was failing the 
 rolling restart.
 So move the new enum to the end of the list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-03-16 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5153:
-

Fix Version/s: (was: 0.90.6)
   0.90.7

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.7

 Attachments: 5153-92.txt, 5153-trunk-minimal.txt, 5153-trunk-v2.txt, 
 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, 
 HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, 
 HBASE-5153_addendum_0.90_1.patch, HBase-5153-90-addendum.patch, 
 TestResults-hbase5153.out


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss

2012-03-16 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231324#comment-13231324
 ] 

Zhihong Yu commented on HBASE-5568:
---

@Chunhui:
A patch for 0.92 is needed:
{code}
p0 HBASE-5568v2.patch
...
Hunk #1 FAILED at 152.
1 out of 1 hunk FAILED -- saving rejects to file 
src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java.rej
{code}
Please also update patch for 0.90 as well.

 Multi concurrent flushcache() for one region could cause data loss
 --

 Key: HBASE-5568
 URL: https://issues.apache.org/jira/browse/HBASE-5568
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, 
 HBASE-5568v2.patch


 We could call HRegion#flushcache() concurrently now through 
 HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin.
 However, we find if HRegion#internalFlushcache() is called concurrently by 
 multi thread, HRegion.memstoreSize will be calculated wrong.
 At the end of HRegion#internalFlushcache(), we will do 
 this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the 
 actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is 
 negative and prevent next flush if we close this region.
 Logs in RS for region e9d827913a056e696c39bc569ea3
 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 128.0m
 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, 
 memsize=59.6m, filesize=31.2m
 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 134.8m
 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, 
 memsize=68.5m, filesize=26.6m
 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~128.1m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction 
 requested=false
 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 6.8m
 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, 
 memsize=3.1m, filesize=1.6m
 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, 
 memsize=3.6m, filesize=1.4m
 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~134.8m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction 
 requested=true
 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, 
 memsize=47.4k, filesize=25.6k
 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, 
 memsize=47.8k, filesize=19.3k
 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~6.8m for region 
 writetest1,,1331454657410.e9d827913a05
 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction 
 requested=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5588) Deprecate/remove AssignmentManager#clearRegionFromTransition

2012-03-16 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5588:
--

Status: Patch Available  (was: Open)

Suite against the four versions is running.

 Deprecate/remove AssignmentManager#clearRegionFromTransition
 

 Key: HBASE-5588
 URL: https://issues.apache.org/jira/browse/HBASE-5588
 Project: HBase
  Issue Type: Sub-task
  Components: hbck
Affects Versions: 0.92.0, 0.90.5, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: hbase-5588-0.90.patch, hbase-5588-0.94.patch, 
 hbase-5588.patch


 This method is essentially a dupe of Assignment#regionOffline.  As suggested 
 in early review of HBASE-5128 - deprecate up to 0.94 and remove from 
 0.96/trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5588) Deprecate/remove AssignmentManager#clearRegionFromTransition

2012-03-16 Thread Jonathan Hsieh (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5588:
--

Attachment: hbase-5588-0.90.patch
hbase-5588-0.94.patch
hbase-5588.patch

hbase-5588.patch removes clearRegionFromTransition
hbase-5588-0.94.patch deprecates and is also applicable to 0.92
hbase-5588-0.90.patch deprecates.

 Deprecate/remove AssignmentManager#clearRegionFromTransition
 

 Key: HBASE-5588
 URL: https://issues.apache.org/jira/browse/HBASE-5588
 Project: HBase
  Issue Type: Sub-task
  Components: hbck
Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: hbase-5588-0.90.patch, hbase-5588-0.94.patch, 
 hbase-5588.patch


 This method is essentially a dupe of Assignment#regionOffline.  As suggested 
 in early review of HBASE-5128 - deprecate up to 0.94 and remove from 
 0.96/trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5588) Deprecate/remove AssignmentManager#clearRegionFromTransition

2012-03-16 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231339#comment-13231339
 ] 

Hadoop QA commented on HBASE-5588:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518692/hbase-5588-0.90.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1205//console

This message is automatically generated.

 Deprecate/remove AssignmentManager#clearRegionFromTransition
 

 Key: HBASE-5588
 URL: https://issues.apache.org/jira/browse/HBASE-5588
 Project: HBase
  Issue Type: Sub-task
  Components: hbck
Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: hbase-5588-0.90.patch, hbase-5588-0.94.patch, 
 hbase-5588.patch


 This method is essentially a dupe of Assignment#regionOffline.  As suggested 
 in early review of HBASE-5128 - deprecate up to 0.94 and remove from 
 0.96/trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5588) Deprecate/remove AssignmentManager#clearRegionFromTransition

2012-03-16 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231341#comment-13231341
 ] 

Zhihong Yu commented on HBASE-5588:
---

+1 on patches.

Hopefully Hadoop QA can pick up hbase-5588.patch

 Deprecate/remove AssignmentManager#clearRegionFromTransition
 

 Key: HBASE-5588
 URL: https://issues.apache.org/jira/browse/HBASE-5588
 Project: HBase
  Issue Type: Sub-task
  Components: hbck
Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: hbase-5588-0.90.patch, hbase-5588-0.94.patch, 
 hbase-5588.patch


 This method is essentially a dupe of Assignment#regionOffline.  As suggested 
 in early review of HBASE-5128 - deprecate up to 0.94 and remove from 
 0.96/trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5593) Reverse DNS resolution in regionServerStartup() does not strip trailing dot

2012-03-16 Thread David S. Wang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David S. Wang updated HBASE-5593:
-

Attachment: HBASE-5593.patch

 Reverse DNS resolution in regionServerStartup() does not strip trailing dot
 ---

 Key: HBASE-5593
 URL: https://issues.apache.org/jira/browse/HBASE-5593
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: David S. Wang
Assignee: David S. Wang
 Fix For: 0.90.7

 Attachments: HBASE-5593.patch


 HBASE-4109 covered the removal of trailing dots in PTR records from reverse 
 DNS lookups.  We seem to have missed a case in HMaster#regionServerStartup().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5593) Reverse DNS resolution in regionServerStartup() does not strip trailing dot

2012-03-16 Thread David S. Wang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David S. Wang updated HBASE-5593:
-

Affects Version/s: (was: 0.90.5)
   0.90.6
   Status: Patch Available  (was: Open)

One-liner change.  Ran through Jenkins.

 Reverse DNS resolution in regionServerStartup() does not strip trailing dot
 ---

 Key: HBASE-5593
 URL: https://issues.apache.org/jira/browse/HBASE-5593
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: David S. Wang
Assignee: David S. Wang
 Fix For: 0.90.7

 Attachments: HBASE-5593.patch


 HBASE-4109 covered the removal of trailing dots in PTR records from reverse 
 DNS lookups.  We seem to have missed a case in HMaster#regionServerStartup().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4269) Add tests and restore semantics to TableInputFormat/TableRecordReader

2012-03-16 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231356#comment-13231356
 ] 

Jonathan Hsieh commented on HBASE-4269:
---

Jan, 

When I did this patch, the two versions had different error recovery path and I 
made them similar.  UnknownScannerException is a subclass DoNotRetryIOException 
so I chose that instead.  

I'm assuming this is causing some pain now -- how is this affecting the job you 
are running? (is it catching and rethrowing other exceptions as well?) 

If there is something we need to change I'm fine with that.  Let's file a new 
issue -- this patch has been in included in a few releases now. 

 Add tests and restore semantics to TableInputFormat/TableRecordReader
 -

 Key: HBASE-4269
 URL: https://issues.apache.org/jira/browse/HBASE-4269
 Project: HBase
  Issue Type: Improvement
  Components: mapred, mapreduce, test
Affects Versions: 0.90.5, 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.90.5

 Attachments: 
 0001-HBASE-4269-Add-tests-and-restore-semantics-to-TableI.patch, 
 0001-HBASE-4269-Add-tests-and-restore-semantics-to-TableI.patch


 HBASE-4196 Modified the semantics of failures in 
 TableImportFormat/TableRecordReader, and had no tests cases.  This patch 
 restores semantics to rethrow when a DoNotRetryIOException is triggered and 
 adds test cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss

2012-03-16 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231361#comment-13231361
 ] 

Zhihong Yu commented on HBASE-5568:
---

Integrated to TRUNK.

Thanks for the patch Chunhui.

 Multi concurrent flushcache() for one region could cause data loss
 --

 Key: HBASE-5568
 URL: https://issues.apache.org/jira/browse/HBASE-5568
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, 
 HBASE-5568v2.patch


 We could call HRegion#flushcache() concurrently now through 
 HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin.
 However, we find if HRegion#internalFlushcache() is called concurrently by 
 multi thread, HRegion.memstoreSize will be calculated wrong.
 At the end of HRegion#internalFlushcache(), we will do 
 this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the 
 actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is 
 negative and prevent next flush if we close this region.
 Logs in RS for region e9d827913a056e696c39bc569ea3
 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 128.0m
 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, 
 memsize=59.6m, filesize=31.2m
 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 134.8m
 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, 
 memsize=68.5m, filesize=26.6m
 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~128.1m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction 
 requested=false
 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 6.8m
 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, 
 memsize=3.1m, filesize=1.6m
 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, 
 memsize=3.6m, filesize=1.4m
 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~134.8m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction 
 requested=true
 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, 
 memsize=47.4k, filesize=25.6k
 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, 
 memsize=47.8k, filesize=19.3k
 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~6.8m for region 
 writetest1,,1331454657410.e9d827913a05
 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction 
 requested=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss

2012-03-16 Thread Zhihong Yu (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231361#comment-13231361
 ] 

Zhihong Yu edited comment on HBASE-5568 at 3/16/12 4:41 PM:


Integrated to TRUNK.

Thanks for the patch Chunhui.

Thanks for the review Ramkrishna and Lars.

  was (Author: zhi...@ebaysf.com):
Integrated to TRUNK.

Thanks for the patch Chunhui.
  
 Multi concurrent flushcache() for one region could cause data loss
 --

 Key: HBASE-5568
 URL: https://issues.apache.org/jira/browse/HBASE-5568
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, 
 HBASE-5568v2.patch


 We could call HRegion#flushcache() concurrently now through 
 HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin.
 However, we find if HRegion#internalFlushcache() is called concurrently by 
 multi thread, HRegion.memstoreSize will be calculated wrong.
 At the end of HRegion#internalFlushcache(), we will do 
 this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the 
 actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is 
 negative and prevent next flush if we close this region.
 Logs in RS for region e9d827913a056e696c39bc569ea3
 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 128.0m
 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, 
 memsize=59.6m, filesize=31.2m
 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 134.8m
 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, 
 memsize=68.5m, filesize=26.6m
 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~128.1m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction 
 requested=false
 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 6.8m
 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, 
 memsize=3.1m, filesize=1.6m
 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, 
 memsize=3.6m, filesize=1.4m
 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~134.8m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction 
 requested=true
 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, 
 memsize=47.4k, filesize=25.6k
 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, 
 memsize=47.8k, filesize=19.3k
 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~6.8m for region 
 writetest1,,1331454657410.e9d827913a05
 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction 
 requested=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5593) Reverse DNS resolution in regionServerStartup() does not strip trailing dot

2012-03-16 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231363#comment-13231363
 ] 

Hadoop QA commented on HBASE-5593:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518698/HBASE-5593.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1206//console

This message is automatically generated.

 Reverse DNS resolution in regionServerStartup() does not strip trailing dot
 ---

 Key: HBASE-5593
 URL: https://issues.apache.org/jira/browse/HBASE-5593
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: David S. Wang
Assignee: David S. Wang
 Fix For: 0.90.7

 Attachments: HBASE-5593.patch


 HBASE-4109 covered the removal of trailing dots in PTR records from reverse 
 DNS lookups.  We seem to have missed a case in HMaster#regionServerStartup().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5593) Reverse DNS resolution in regionServerStartup() does not strip trailing dot

2012-03-16 Thread David S. Wang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231370#comment-13231370
 ] 

David S. Wang commented on HBASE-5593:
--

This will not apply to trunk as the robot is trying to do, because the fix is 
only applicable to 0.90.x.

Also, I can add a test case if necessary, though I think the fix is fairly 
obvious in this case.  Let me know.

 Reverse DNS resolution in regionServerStartup() does not strip trailing dot
 ---

 Key: HBASE-5593
 URL: https://issues.apache.org/jira/browse/HBASE-5593
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: David S. Wang
Assignee: David S. Wang
 Fix For: 0.90.7

 Attachments: HBASE-5593.patch


 HBASE-4109 covered the removal of trailing dots in PTR records from reverse 
 DNS lookups.  We seem to have missed a case in HMaster#regionServerStartup().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5594) Unable to stop a master that's waiting on -ROOT- during initialization

2012-03-16 Thread Jean-Daniel Cryans (Created) (JIRA)
Unable to stop a master that's waiting on -ROOT- during initialization
--

 Key: HBASE-5594
 URL: https://issues.apache.org/jira/browse/HBASE-5594
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.2, 0.94.0, 0.96.0


We just had a case where the master (that was just restarted) was having a hard 
time assigning -ROOT- (all the PRI handlers were full already) so we tried to 
shutdown the cluster and even though all the RS closed down properly the master 
kept running being blocked on:

{noformat}
master-sv4r20s12,10302,1331916142866 prio=10 tid=0x7f3708008800 
nid=0x4b20 in Object.wait() [0x7f370d1d]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x0006030be3f8 (a 
org.apache.hadoop.hbase.zookeeper.RootRegionTracker)
at java.lang.Object.wait(Object.java:485)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
- locked 0x0006030be3f8 (a 
org.apache.hadoop.hbase.zookeeper.RootRegionTracker)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)
- locked 0x0006030be3f8 (a 
org.apache.hadoop.hbase.zookeeper.RootRegionTracker)
at 
org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:313)
at 
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:571)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:336)
at java.lang.Thread.run(Thread.java:662)
{noformat}

I haven't checked the 0.90 code, we got this on 0.92.1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK

2012-03-16 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5206:
--

Attachment: 5206_trunk_latest_3.patch

Patch v3 addresses my review comments.
TestAdmin#testEnableDisableAddColumnDeleteColumn passes.

 Port HBASE-5155 to 0.92, 0.94, and TRUNK
 

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Zhihong Yu
Assignee: Ashutosh Jindal
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk-v3.patch, 
 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 
 5206_trunk_latest_3.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231418#comment-13231418
 ] 

Zhihong Yu commented on HBASE-5549:
---

TestMasterZKSessionRecovery is removed. Is it covered in other tests now ?

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v10.patch, 5549.v6.patch, 5549.v7.patch, 
 5549.v8.patch, 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231441#comment-13231441
 ] 

nkeywal commented on HBASE-5549:


Yes, see HBASE-5572 for the reasons...

On Fri, Mar 16, 2012 at 6:41 PM, Zhihong Yu (Commented) (JIRA) 



 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v10.patch, 5549.v6.patch, 5549.v7.patch, 
 5549.v8.patch, 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231444#comment-13231444
 ] 

nkeywal commented on HBASE-5549:


Can't create a review, I got error 500 as well...

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v10.patch, 5549.v6.patch, 5549.v7.patch, 
 5549.v8.patch, 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK

2012-03-16 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231447#comment-13231447
 ] 

Hadoop QA commented on HBASE-5206:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12518704/5206_trunk_latest_3.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 18 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 161 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
  org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
  org.apache.hadoop.hbase.TestZooKeeper

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1207//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1207//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1207//console

This message is automatically generated.

 Port HBASE-5155 to 0.92, 0.94, and TRUNK
 

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Zhihong Yu
Assignee: Ashutosh Jindal
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk-v3.patch, 
 5206_trunk_1.patch, 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 
 5206_trunk_latest_3.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231453#comment-13231453
 ] 

nkeywal commented on HBASE-5549:


Now it works: https://reviews.apache.org/r/4391/

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v10.patch, 5549.v6.patch, 5549.v7.patch, 
 5549.v8.patch, 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5595) Fix NoSuchMethodException in 0.92 when running on local filesystem

2012-03-16 Thread stack (Created) (JIRA)
Fix NoSuchMethodException in 0.92 when running on local filesystem
--

 Key: HBASE-5595
 URL: https://issues.apache.org/jira/browse/HBASE-5595
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical
 Fix For: 0.92.2


Fix this ugly exception that shows when running 0.92.1 when on local filesystem:

{code}
2012-03-16 10:54:48,351 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
getNumCurrentReplicas--HDFS-826 not available; 
hdfs_out=org.apache.hadoop.fs.FSDataOutputStream@301abf87
java.lang.NoSuchMethodException: 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
at java.lang.Class.getDeclaredMethod(Class.java:1937)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.getGetNumCurrentReplicas(HLog.java:425)
at org.apache.hadoop.hbase.regionserver.wal.HLog.init(HLog.java:408)
at org.apache.hadoop.hbase.regionserver.wal.HLog.init(HLog.java:331)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateHLog(HRegionServer.java:1229)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1218)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:937)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:648)
at java.lang.Thread.run(Thread.java:680)
{code}



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5520) Support reseek() at RegionScanner

2012-03-16 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231459#comment-13231459
 ] 

Lars Hofhansl commented on HBASE-5520:
--

+1 on 0.94.
Another question about the patch:
{code}
+public synchronized boolean reseek(byte[] row) throws IOException {
...
+  startRegionOperation();
{code}
# why start a region operation here? This is called only called from a 
coprocessor, right? So should already be in a region operation.
# why synchronized?


 Support reseek() at RegionScanner
 -

 Key: HBASE-5520
 URL: https://issues.apache.org/jira/browse/HBASE-5520
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.92.0
Reporter: Anoop Sam John
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.96.0

 Attachments: HBASE-5520_1.patch, HBASE-5520_2.patch, 
 HBASE-5520_3.patch


 reseek() is not supported currently at the RegionScanner level. We can 
 support the same.
 This is created following the discussion under HBASE-2038

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK

2012-03-16 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5206:
--

Attachment: (was: 5206_trunk_latest_3.patch)

 Port HBASE-5155 to 0.92, 0.94, and TRUNK
 

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Zhihong Yu
Assignee: Ashutosh Jindal
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk_1.patch, 
 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 
 5206_trunk_latest_3.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss

2012-03-16 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231464#comment-13231464
 ] 

Hudson commented on HBASE-5568:
---

Integrated in HBase-TRUNK #2684 (See 
[https://builds.apache.org/job/HBase-TRUNK/2684/])
HBASE-5568 Multi concurrent flushcache() for one region could cause data 
loss (Chunhui) (Revision 1301639)

 Result = SUCCESS
tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java


 Multi concurrent flushcache() for one region could cause data loss
 --

 Key: HBASE-5568
 URL: https://issues.apache.org/jira/browse/HBASE-5568
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, 
 HBASE-5568v2.patch


 We could call HRegion#flushcache() concurrently now through 
 HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin.
 However, we find if HRegion#internalFlushcache() is called concurrently by 
 multi thread, HRegion.memstoreSize will be calculated wrong.
 At the end of HRegion#internalFlushcache(), we will do 
 this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the 
 actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is 
 negative and prevent next flush if we close this region.
 Logs in RS for region e9d827913a056e696c39bc569ea3
 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 128.0m
 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, 
 memsize=59.6m, filesize=31.2m
 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 134.8m
 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, 
 memsize=68.5m, filesize=26.6m
 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~128.1m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction 
 requested=false
 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 6.8m
 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, 
 memsize=3.1m, filesize=1.6m
 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, 
 memsize=3.6m, filesize=1.4m
 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~134.8m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction 
 requested=true
 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, 
 memsize=47.4k, filesize=25.6k
 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, 
 memsize=47.8k, filesize=19.3k
 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~6.8m for region 
 writetest1,,1331454657410.e9d827913a05
 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction 
 requested=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: 

[jira] [Updated] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK

2012-03-16 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5206:
--

Attachment: (was: 5206_trunk-v3.patch)

 Port HBASE-5155 to 0.92, 0.94, and TRUNK
 

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Zhihong Yu
Assignee: Ashutosh Jindal
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk_1.patch, 
 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 
 5206_trunk_latest_3.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss

2012-03-16 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231467#comment-13231467
 ] 

Zhihong Yu commented on HBASE-5568:
---

Integrated to 0.94 as well.

 Multi concurrent flushcache() for one region could cause data loss
 --

 Key: HBASE-5568
 URL: https://issues.apache.org/jira/browse/HBASE-5568
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, 
 HBASE-5568v2.patch


 We could call HRegion#flushcache() concurrently now through 
 HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin.
 However, we find if HRegion#internalFlushcache() is called concurrently by 
 multi thread, HRegion.memstoreSize will be calculated wrong.
 At the end of HRegion#internalFlushcache(), we will do 
 this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the 
 actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is 
 negative and prevent next flush if we close this region.
 Logs in RS for region e9d827913a056e696c39bc569ea3
 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 128.0m
 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, 
 memsize=59.6m, filesize=31.2m
 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 134.8m
 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, 
 memsize=68.5m, filesize=26.6m
 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~128.1m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction 
 requested=false
 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 6.8m
 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, 
 memsize=3.1m, filesize=1.6m
 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, 
 memsize=3.6m, filesize=1.4m
 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~134.8m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction 
 requested=true
 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, 
 memsize=47.4k, filesize=25.6k
 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, 
 memsize=47.8k, filesize=19.3k
 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~6.8m for region 
 writetest1,,1331454657410.e9d827913a05
 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction 
 requested=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231491#comment-13231491
 ] 

stack commented on HBASE-5549:
--

@N LGTM.  Change reconnectAfterExpiry to reconnectAfterExpiration and post 
patch here.   Will run it through hadoopqa then commit if all good.

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v10.patch, 5549.v6.patch, 5549.v7.patch, 
 5549.v8.patch, 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5581) Creating a table with invalid syntax does not give an error message when it fails

2012-03-16 Thread Mikhail Bautin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231495#comment-13231495
 ] 

Mikhail Bautin commented on HBASE-5581:
---

Is this OK to commit to trunk? Could someone +1? Thanks!

 Creating a table with invalid syntax does not give an error message when it 
 fails
 -

 Key: HBASE-5581
 URL: https://issues.apache.org/jira/browse/HBASE-5581
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: Binu John
Priority: Minor
 Attachments: D2343.1.patch


 Creating a table with invalid syntax does not give an error message when it 
 fails. In this case, it doesn't actually create the CF requested, but doesn't 
 give any indication to the user that it failed.
 create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, NUMREGIONS 
 = 20, SPLITALGO = HexStringSplit, COMPRESSION = 'LZO', BLOOMFILTER = 
 'ROW'}
 0 row(s) in 3.0930 seconds
 hbase(main):002:0 describe 'test'
 DESCRIPTION   
   ENABLED
  {NAME = 'test', FAMILIES = []} 
   true   
 1 row(s) in 0.1430 seconds
 
 Putting {NUMREGIONS = 20, SPLITALGO = HexStringSplit} into a separate 
 stanza works fine, so the feature is fine. 
 create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, 
 COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'}, {NUMREGIONS = 20, SPLITALGO = 
 HexStringSplit}
 0 row(s) in 2.7860 seconds
 hbase(main):002:0 describe 'test'
 DESCRIPTION   
   ENABLED
  {NAME = 'test', FAMILIES = [{NAME = 'test', DATA_BLOCK_ENCODING = 
 'NONE',  true   
  BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', COMPRESSION = 'LZO', 
 VERSIONS
   = '1', TTL = '2147483647', BLOCKSIZE = '65536', BLOOMFILTER_ERRORRATE = 
 '
  0.01', ENCODE_ON_DISK = 'true', IN_MEMORY = 'false', BLOCKCACHE = 
 'true'}]}  
 
 We should throw an error if we can't create the CF so it's clear to the user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5581) Creating a table with invalid syntax does not give an error message when it fails

2012-03-16 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231502#comment-13231502
 ] 

Lars Hofhansl commented on HBASE-5581:
--

+1

 Creating a table with invalid syntax does not give an error message when it 
 fails
 -

 Key: HBASE-5581
 URL: https://issues.apache.org/jira/browse/HBASE-5581
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: Binu John
Priority: Minor
 Attachments: D2343.1.patch


 Creating a table with invalid syntax does not give an error message when it 
 fails. In this case, it doesn't actually create the CF requested, but doesn't 
 give any indication to the user that it failed.
 create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, NUMREGIONS 
 = 20, SPLITALGO = HexStringSplit, COMPRESSION = 'LZO', BLOOMFILTER = 
 'ROW'}
 0 row(s) in 3.0930 seconds
 hbase(main):002:0 describe 'test'
 DESCRIPTION   
   ENABLED
  {NAME = 'test', FAMILIES = []} 
   true   
 1 row(s) in 0.1430 seconds
 
 Putting {NUMREGIONS = 20, SPLITALGO = HexStringSplit} into a separate 
 stanza works fine, so the feature is fine. 
 create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, 
 COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'}, {NUMREGIONS = 20, SPLITALGO = 
 HexStringSplit}
 0 row(s) in 2.7860 seconds
 hbase(main):002:0 describe 'test'
 DESCRIPTION   
   ENABLED
  {NAME = 'test', FAMILIES = [{NAME = 'test', DATA_BLOCK_ENCODING = 
 'NONE',  true   
  BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', COMPRESSION = 'LZO', 
 VERSIONS
   = '1', TTL = '2147483647', BLOCKSIZE = '65536', BLOOMFILTER_ERRORRATE = 
 '
  0.01', ENCODE_ON_DISK = 'true', IN_MEMORY = 'false', BLOCKCACHE = 
 'true'}]}  
 
 We should throw an error if we can't create the CF so it's clear to the user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss

2012-03-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231507#comment-13231507
 ] 

stack commented on HBASE-5568:
--

+1 Good find Chunhui.

 Multi concurrent flushcache() for one region could cause data loss
 --

 Key: HBASE-5568
 URL: https://issues.apache.org/jira/browse/HBASE-5568
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, 
 HBASE-5568v2.patch


 We could call HRegion#flushcache() concurrently now through 
 HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin.
 However, we find if HRegion#internalFlushcache() is called concurrently by 
 multi thread, HRegion.memstoreSize will be calculated wrong.
 At the end of HRegion#internalFlushcache(), we will do 
 this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the 
 actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is 
 negative and prevent next flush if we close this region.
 Logs in RS for region e9d827913a056e696c39bc569ea3
 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 128.0m
 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, 
 memsize=59.6m, filesize=31.2m
 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 134.8m
 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, 
 memsize=68.5m, filesize=26.6m
 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~128.1m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction 
 requested=false
 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 6.8m
 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, 
 memsize=3.1m, filesize=1.6m
 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, 
 memsize=3.6m, filesize=1.4m
 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~134.8m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction 
 requested=true
 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, 
 memsize=47.4k, filesize=25.6k
 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, 
 memsize=47.8k, filesize=19.3k
 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~6.8m for region 
 writetest1,,1331454657410.e9d827913a05
 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction 
 requested=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5581) Creating a table with invalid syntax does not give an error message when it fails

2012-03-16 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5581:
-

Attachment: 5581trunk.patch

What I committed to trunk and 0.94.  Its a one-liner only.

 Creating a table with invalid syntax does not give an error message when it 
 fails
 -

 Key: HBASE-5581
 URL: https://issues.apache.org/jira/browse/HBASE-5581
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: Binu John
Priority: Minor
 Fix For: 0.94.0, 0.96.0

 Attachments: 5581trunk.patch, D2343.1.patch


 Creating a table with invalid syntax does not give an error message when it 
 fails. In this case, it doesn't actually create the CF requested, but doesn't 
 give any indication to the user that it failed.
 create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, NUMREGIONS 
 = 20, SPLITALGO = HexStringSplit, COMPRESSION = 'LZO', BLOOMFILTER = 
 'ROW'}
 0 row(s) in 3.0930 seconds
 hbase(main):002:0 describe 'test'
 DESCRIPTION   
   ENABLED
  {NAME = 'test', FAMILIES = []} 
   true   
 1 row(s) in 0.1430 seconds
 
 Putting {NUMREGIONS = 20, SPLITALGO = HexStringSplit} into a separate 
 stanza works fine, so the feature is fine. 
 create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, 
 COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'}, {NUMREGIONS = 20, SPLITALGO = 
 HexStringSplit}
 0 row(s) in 2.7860 seconds
 hbase(main):002:0 describe 'test'
 DESCRIPTION   
   ENABLED
  {NAME = 'test', FAMILIES = [{NAME = 'test', DATA_BLOCK_ENCODING = 
 'NONE',  true   
  BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', COMPRESSION = 'LZO', 
 VERSIONS
   = '1', TTL = '2147483647', BLOCKSIZE = '65536', BLOOMFILTER_ERRORRATE = 
 '
  0.01', ENCODE_ON_DISK = 'true', IN_MEMORY = 'false', BLOCKCACHE = 
 'true'}]}  
 
 We should throw an error if we can't create the CF so it's clear to the user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5581) Creating a table with invalid syntax does not give an error message when it fails

2012-03-16 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-5581.
--

  Resolution: Fixed
Hadoop Flags: Reviewed

Thanks for the patch Binu.  Committed trunk and 0.94.

 Creating a table with invalid syntax does not give an error message when it 
 fails
 -

 Key: HBASE-5581
 URL: https://issues.apache.org/jira/browse/HBASE-5581
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: Binu John
Priority: Minor
 Fix For: 0.94.0, 0.96.0

 Attachments: 5581trunk.patch, D2343.1.patch


 Creating a table with invalid syntax does not give an error message when it 
 fails. In this case, it doesn't actually create the CF requested, but doesn't 
 give any indication to the user that it failed.
 create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, NUMREGIONS 
 = 20, SPLITALGO = HexStringSplit, COMPRESSION = 'LZO', BLOOMFILTER = 
 'ROW'}
 0 row(s) in 3.0930 seconds
 hbase(main):002:0 describe 'test'
 DESCRIPTION   
   ENABLED
  {NAME = 'test', FAMILIES = []} 
   true   
 1 row(s) in 0.1430 seconds
 
 Putting {NUMREGIONS = 20, SPLITALGO = HexStringSplit} into a separate 
 stanza works fine, so the feature is fine. 
 create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, 
 COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'}, {NUMREGIONS = 20, SPLITALGO = 
 HexStringSplit}
 0 row(s) in 2.7860 seconds
 hbase(main):002:0 describe 'test'
 DESCRIPTION   
   ENABLED
  {NAME = 'test', FAMILIES = [{NAME = 'test', DATA_BLOCK_ENCODING = 
 'NONE',  true   
  BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', COMPRESSION = 'LZO', 
 VERSIONS
   = '1', TTL = '2147483647', BLOCKSIZE = '65536', BLOOMFILTER_ERRORRATE = 
 '
  0.01', ENCODE_ON_DISK = 'true', IN_MEMORY = 'false', BLOCKCACHE = 
 'true'}]}  
 
 We should throw an error if we can't create the CF so it's clear to the user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss

2012-03-16 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231517#comment-13231517
 ] 

Zhihong Yu commented on HBASE-5568:
---

Integrated to 0.90 branch as well.

 Multi concurrent flushcache() for one region could cause data loss
 --

 Key: HBASE-5568
 URL: https://issues.apache.org/jira/browse/HBASE-5568
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, 
 HBASE-5568v2.patch


 We could call HRegion#flushcache() concurrently now through 
 HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin.
 However, we find if HRegion#internalFlushcache() is called concurrently by 
 multi thread, HRegion.memstoreSize will be calculated wrong.
 At the end of HRegion#internalFlushcache(), we will do 
 this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the 
 actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is 
 negative and prevent next flush if we close this region.
 Logs in RS for region e9d827913a056e696c39bc569ea3
 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 128.0m
 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, 
 memsize=59.6m, filesize=31.2m
 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 134.8m
 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, 
 memsize=68.5m, filesize=26.6m
 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~128.1m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction 
 requested=false
 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 6.8m
 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, 
 memsize=3.1m, filesize=1.6m
 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, 
 memsize=3.6m, filesize=1.4m
 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~134.8m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction 
 requested=true
 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, 
 memsize=47.4k, filesize=25.6k
 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, 
 memsize=47.8k, filesize=19.3k
 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~6.8m for region 
 writetest1,,1331454657410.e9d827913a05
 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction 
 requested=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK

2012-03-16 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231526#comment-13231526
 ] 

Hadoop QA commented on HBASE-5206:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12518712/5206_trunk_latest_3.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 18 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 161 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1208//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1208//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1208//console

This message is automatically generated.

 Port HBASE-5155 to 0.92, 0.94, and TRUNK
 

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Zhihong Yu
Assignee: Ashutosh Jindal
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk_1.patch, 
 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 
 5206_trunk_latest_3.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK

2012-03-16 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231538#comment-13231538
 ] 

Zhihong Yu commented on HBASE-5206:
---

Integrated 5206_trunk_latest_3.patch to TRUNK.

 Port HBASE-5155 to 0.92, 0.94, and TRUNK
 

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Zhihong Yu
Assignee: Ashutosh Jindal
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk_1.patch, 
 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 
 5206_trunk_latest_3.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5581) Creating a table with invalid syntax does not give an error message when it fails

2012-03-16 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231539#comment-13231539
 ] 

Hudson commented on HBASE-5581:
---

Integrated in HBase-0.94 #34 (See 
[https://builds.apache.org/job/HBase-0.94/34/])
HBASE-5581 Creating a table with invalid syntax does not give an error 
message when it fails (Revision 1301690)

 Result = SUCCESS
stack : 
Files : 
* /hbase/branches/0.94/src/main/ruby/hbase/admin.rb


 Creating a table with invalid syntax does not give an error message when it 
 fails
 -

 Key: HBASE-5581
 URL: https://issues.apache.org/jira/browse/HBASE-5581
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: Binu John
Priority: Minor
 Fix For: 0.94.0, 0.96.0

 Attachments: 5581trunk.patch, D2343.1.patch


 Creating a table with invalid syntax does not give an error message when it 
 fails. In this case, it doesn't actually create the CF requested, but doesn't 
 give any indication to the user that it failed.
 create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, NUMREGIONS 
 = 20, SPLITALGO = HexStringSplit, COMPRESSION = 'LZO', BLOOMFILTER = 
 'ROW'}
 0 row(s) in 3.0930 seconds
 hbase(main):002:0 describe 'test'
 DESCRIPTION   
   ENABLED
  {NAME = 'test', FAMILIES = []} 
   true   
 1 row(s) in 0.1430 seconds
 
 Putting {NUMREGIONS = 20, SPLITALGO = HexStringSplit} into a separate 
 stanza works fine, so the feature is fine. 
 create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, 
 COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'}, {NUMREGIONS = 20, SPLITALGO = 
 HexStringSplit}
 0 row(s) in 2.7860 seconds
 hbase(main):002:0 describe 'test'
 DESCRIPTION   
   ENABLED
  {NAME = 'test', FAMILIES = [{NAME = 'test', DATA_BLOCK_ENCODING = 
 'NONE',  true   
  BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', COMPRESSION = 'LZO', 
 VERSIONS
   = '1', TTL = '2147483647', BLOCKSIZE = '65536', BLOOMFILTER_ERRORRATE = 
 '
  0.01', ENCODE_ON_DISK = 'true', IN_MEMORY = 'false', BLOCKCACHE = 
 'true'}]}  
 
 We should throw an error if we can't create the CF so it's clear to the user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5568) Multi concurrent flushcache() for one region could cause data loss

2012-03-16 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231540#comment-13231540
 ] 

Hudson commented on HBASE-5568:
---

Integrated in HBase-0.94 #34 (See 
[https://builds.apache.org/job/HBase-0.94/34/])
HBASE-5568 Multi concurrent flushcache() for one region could cause data 
loss (Chunhui) (Revision 1301676)

 Result = SUCCESS
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java


 Multi concurrent flushcache() for one region could cause data loss
 --

 Key: HBASE-5568
 URL: https://issues.apache.org/jira/browse/HBASE-5568
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5568-90.patch, HBASE-5568.patch, HBASE-5568.patch, 
 HBASE-5568v2.patch


 We could call HRegion#flushcache() concurrently now through 
 HRegionServer#splitRegion or HRegionServer#flushRegion by HBaseAdmin.
 However, we find if HRegion#internalFlushcache() is called concurrently by 
 multi thread, HRegion.memstoreSize will be calculated wrong.
 At the end of HRegion#internalFlushcache(), we will do 
 this.addAndGetGlobalMemstoreSize(-flushsize), but the flushsize may not the 
 actual memsize which flushed to hdfs. It cause HRegion.memstoreSize is 
 negative and prevent next flush if we close this region.
 Logs in RS for region e9d827913a056e696c39bc569ea3
 2012-03-11 16:31:36,690 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 128.0m
 2012-03-11 16:31:37,999 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/8162481165586107427, entries=153106, sequenceid=619316544, 
 memsize=59.6m, filesize=31.2m
 2012-03-11 16:31:38,830 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 134.8m
 2012-03-11 16:31:39,458 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/3425971951499794221, entries=230183, sequenceid=619316544, 
 memsize=68.5m, filesize=26.6m
 2012-03-11 16:31:39,459 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~128.1m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 2769ms, sequenceid=619316544, compaction 
 requested=false
 2012-03-11 16:31:39,459 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Started memstore flush for 
 writetest1,,1331454657410.e9d827913a056e696c39bc569ea3
 f99f., current region memstore size 6.8m
 2012-03-11 16:31:39,529 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/1811012969998104626, entries=8002, sequenceid=619332759, 
 memsize=3.1m, filesize=1.6m
 2012-03-11 16:31:39,640 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/770333473623552048, entries=12231, sequenceid=619332759, 
 memsize=3.6m, filesize=1.4m
 2012-03-11 16:31:39,641 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~134.8m for region 
 writetest1,,1331454657410.e9d827913a
 056e696c39bc569ea3f99f. in 811ms, sequenceid=619332759, compaction 
 requested=true
 2012-03-11 16:31:39,707 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf1/5656568849587368557, entries=119, sequenceid=619332979, 
 memsize=47.4k, filesize=25.6k
 2012-03-11 16:31:39,775 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://dw74.kgb.sqa.cm4:9700/hbase-func1/writetest1/e9d827913a056e696c39bc569e
 a3f99f/cf2/794343845650987521, entries=157, sequenceid=619332979, 
 memsize=47.8k, filesize=19.3k
 2012-03-11 16:31:39,777 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~6.8m for region 
 writetest1,,1331454657410.e9d827913a05
 6e696c39bc569ea3f99f. in 318ms, sequenceid=619332979, compaction 
 requested=true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, 

[jira] [Commented] (HBASE-5593) Reverse DNS resolution in regionServerStartup() does not strip trailing dot

2012-03-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231551#comment-13231551
 ] 

stack commented on HBASE-5593:
--

@Ted You mean a unit test for the Strings.domainNamePointerToHostName facility? 
(We don't want to test InetSocketAddress).

 Reverse DNS resolution in regionServerStartup() does not strip trailing dot
 ---

 Key: HBASE-5593
 URL: https://issues.apache.org/jira/browse/HBASE-5593
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: David S. Wang
Assignee: David S. Wang
 Fix For: 0.90.7

 Attachments: HBASE-5593.patch


 HBASE-4109 covered the removal of trailing dots in PTR records from reverse 
 DNS lookups.  We seem to have missed a case in HMaster#regionServerStartup().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5588) Deprecate/remove AssignmentManager#clearRegionFromTransition

2012-03-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231553#comment-13231553
 ] 

stack commented on HBASE-5588:
--

+1 on patch set

 Deprecate/remove AssignmentManager#clearRegionFromTransition
 

 Key: HBASE-5588
 URL: https://issues.apache.org/jira/browse/HBASE-5588
 Project: HBase
  Issue Type: Sub-task
  Components: hbck
Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: hbase-5588-0.90.patch, hbase-5588-0.94.patch, 
 hbase-5588.patch


 This method is essentially a dupe of Assignment#regionOffline.  As suggested 
 in early review of HBASE-5128 - deprecate up to 0.94 and remove from 
 0.96/trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-5592) Make it easier to get a table from shell

2012-03-16 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-5592.
--

   Resolution: Fixed
Fix Version/s: 0.92.2
 Hadoop Flags: Reviewed

Committed trunk, 0.94, and 0.92 branches.  Thanks for the patch Ben.

 Make it easier to get a table from shell
 

 Key: HBASE-5592
 URL: https://issues.apache.org/jira/browse/HBASE-5592
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 0.94.0
Reporter: Ben West
Assignee: Ben West
Priority: Trivial
  Labels: shell
 Fix For: 0.92.2, 0.94.0

 Attachments: publicTable.patch


 The one argument constructor to HTable was removed at some point, which means 
 that you now have to pass in a Configuration to instantiate an HTable. This 
 is annoying for me when I create quick scripts.
 This JIRA is a tiny patch which lets you get an HTable instance in the shell 
 by doing
 {code}foo_table = @shell.hbase_table('foo').table{code}
 Basically, it is changing table to be a public member rather than a private 
 one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5563) HRegionInfo#compareTo add the comparison of regionId

2012-03-16 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231557#comment-13231557
 ] 

stack commented on HBASE-5563:
--

+1 on patch for trunk and 0.92  (again).  Older regionids should appear earlier 
in a sorted list than newer regionids as per this patch.  

 HRegionInfo#compareTo add the comparison of regionId
 

 Key: HBASE-5563
 URL: https://issues.apache.org/jira/browse/HBASE-5563
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5, 0.92.0, 0.94.0, 0.96.0
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: HBASE-5563.patch, HBASE-5563v2.patch, 
 HBASE-5563v2.patch, hbase-5563-v3-0.92.patch, hbase-5563-v3.patch


 In the one region multi assigned case,  we could find that two regions have 
 the same table name, same startKey, same endKey, and different regionId, so 
 these two regions are same in TreeMap but different in HashMap.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5578) NPE when regionserver reported server load, caused rs stop.

2012-03-16 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5578:
-

Attachment: 5589.txt

How about this.  Goes through Store and checks all Reader instances for null 
before using.  We were doing this in half the cases already.

Converts the NPE into a null warning.  Means we don't crash.  Puts off having 
to spend time on why the Reader is null at particular junctures.

Should go into 0.94?

 NPE when regionserver reported server load, caused rs stop.
 ---

 Key: HBASE-5578
 URL: https://issues.apache.org/jira/browse/HBASE-5578
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
 Environment: centos6.2 hadoop-1.0.0 hbase-0.92.0
Reporter: Storm Lee
Priority: Critical
 Fix For: 0.92.2

 Attachments: 5589.txt


 The regeionserver log:
 2012-03-11 11:55:37,808 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 data3,60020,1331286604591: Unhandled exception: null
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.Store.getTotalStaticIndexSize(Store.java:1788)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:994)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:800)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:776)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:678)
   at java.lang.Thread.run(Thread.java:662)
 2012-03-11 11:55:37,808 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: 
 loaded coprocessors are: []
 2012-03-11 11:55:37,808 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: 
 requestsPerSecond=1687, numberOfOnlineRegions=37, numberOfStores=37, 
 numberOfStorefiles=144, storefileIndexSizeMB=2, rootIndexSizeKB=2362, 
 totalStaticIndexSizeKB=229808, totalStaticBloomSizeKB=2166296, 
 memstoreSizeMB=2854, readRequestsCount=1352673, writeRequestsCount=113137586, 
 compactionQueueSize=8, flushQueueSize=3, usedHeapMB=7359, maxHeapMB=12999, 
 blockCacheSizeMB=32.31, blockCacheFreeMB=3867.52, blockCacheCount=38, 
 blockCacheHitCount=87713, blockCacheMissCount=22144560, 
 blockCacheEvictedCount=122, blockCacheHitRatio=0%, 
 blockCacheHitCachingRatio=99%, hdfsBlocksLocalityIndex=100
 2012-03-11 11:55:37,992 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled 
 exception: null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5578) NPE when regionserver reported server load, caused rs stop.

2012-03-16 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5578:
-

Status: Patch Available  (was: Open)

 NPE when regionserver reported server load, caused rs stop.
 ---

 Key: HBASE-5578
 URL: https://issues.apache.org/jira/browse/HBASE-5578
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
 Environment: centos6.2 hadoop-1.0.0 hbase-0.92.0
Reporter: Storm Lee
Priority: Critical
 Attachments: 5589.txt


 The regeionserver log:
 2012-03-11 11:55:37,808 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 data3,60020,1331286604591: Unhandled exception: null
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.Store.getTotalStaticIndexSize(Store.java:1788)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:994)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:800)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:776)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:678)
   at java.lang.Thread.run(Thread.java:662)
 2012-03-11 11:55:37,808 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: 
 loaded coprocessors are: []
 2012-03-11 11:55:37,808 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: 
 requestsPerSecond=1687, numberOfOnlineRegions=37, numberOfStores=37, 
 numberOfStorefiles=144, storefileIndexSizeMB=2, rootIndexSizeKB=2362, 
 totalStaticIndexSizeKB=229808, totalStaticBloomSizeKB=2166296, 
 memstoreSizeMB=2854, readRequestsCount=1352673, writeRequestsCount=113137586, 
 compactionQueueSize=8, flushQueueSize=3, usedHeapMB=7359, maxHeapMB=12999, 
 blockCacheSizeMB=32.31, blockCacheFreeMB=3867.52, blockCacheCount=38, 
 blockCacheHitCount=87713, blockCacheMissCount=22144560, 
 blockCacheEvictedCount=122, blockCacheHitRatio=0%, 
 blockCacheHitCachingRatio=99%, hdfsBlocksLocalityIndex=100
 2012-03-11 11:55:37,992 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled 
 exception: null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5578) NPE when regionserver reported server load, caused rs stop.

2012-03-16 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5578:
-

Fix Version/s: 0.92.2

 NPE when regionserver reported server load, caused rs stop.
 ---

 Key: HBASE-5578
 URL: https://issues.apache.org/jira/browse/HBASE-5578
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
 Environment: centos6.2 hadoop-1.0.0 hbase-0.92.0
Reporter: Storm Lee
Priority: Critical
 Fix For: 0.92.2

 Attachments: 5589.txt


 The regeionserver log:
 2012-03-11 11:55:37,808 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 data3,60020,1331286604591: Unhandled exception: null
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.Store.getTotalStaticIndexSize(Store.java:1788)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:994)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:800)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:776)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:678)
   at java.lang.Thread.run(Thread.java:662)
 2012-03-11 11:55:37,808 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: 
 loaded coprocessors are: []
 2012-03-11 11:55:37,808 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: 
 requestsPerSecond=1687, numberOfOnlineRegions=37, numberOfStores=37, 
 numberOfStorefiles=144, storefileIndexSizeMB=2, rootIndexSizeKB=2362, 
 totalStaticIndexSizeKB=229808, totalStaticBloomSizeKB=2166296, 
 memstoreSizeMB=2854, readRequestsCount=1352673, writeRequestsCount=113137586, 
 compactionQueueSize=8, flushQueueSize=3, usedHeapMB=7359, maxHeapMB=12999, 
 blockCacheSizeMB=32.31, blockCacheFreeMB=3867.52, blockCacheCount=38, 
 blockCacheHitCount=87713, blockCacheMissCount=22144560, 
 blockCacheEvictedCount=122, blockCacheHitRatio=0%, 
 blockCacheHitCachingRatio=99%, hdfsBlocksLocalityIndex=100
 2012-03-11 11:55:37,992 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled 
 exception: null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-03-16 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231579#comment-13231579
 ] 

Hudson commented on HBASE-5155:
---

Integrated in HBase-TRUNK #2685 (See 
[https://builds.apache.org/job/HBase-TRUNK/2685/])
HBASE-5206 Port HBASE-5155 to trunk (Ashutosh Jindal) (Revision 1301709)

 Result = SUCCESS
tedyu : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java


 ServerShutDownHandler And Disable/Delete should not happen parallely leading 
 to recreation of regions that were deleted
 ---

 Key: HBASE-5155
 URL: https://issues.apache.org/jira/browse/HBASE-5155
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.90.6

 Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, 
 HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch


 ServerShutDownHandler and disable/delete table handler races.  This is not an 
 issue due to TM.
 - A regionserver goes down.  In our cluster the regionserver holds lot of 
 regions.
 - A region R1 has two daughters D1 and D2.
 - The ServerShutdownHandler gets called and scans the META and gets all the 
 user regions
 - Parallely a table is disabled. (No problem in this step).
 - Delete table is done.
 - The tables and its regions are deleted including R1, D1 and D2.. (So META 
 is cleaned)
 - Now ServerShutdownhandler starts to processTheDeadRegion
 {code}
  if (hri.isOffline()  hri.isSplit()) {
   LOG.debug(Offlined and split region  + hri.getRegionNameAsString() +
 ; checking daughter presence);
   fixupDaughters(result, assignmentManager, catalogTracker);
 {code}
 As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
 {code}
 if (isDaughterMissing(catalogTracker, daughter)) {
   LOG.info(Fixup; missing daughter  + daughter.getRegionNameAsString());
   MetaEditor.addDaughter(catalogTracker, daughter, null);
   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
   // there then something wonky about the split -- things will keep going
   // but could be missing references to parent region.
   // And assign it.
   assignmentManager.assign(daughter, true);
 {code}
 we call assign of the daughers.  
 Now after this we again start with the below code.
 {code}
 if (processDeadRegion(e.getKey(), e.getValue(),
 this.services.getAssignmentManager(),
 this.server.getCatalogTracker())) {
   this.services.getAssignmentManager().assign(e.getKey(), true);
 {code}
 Now when the SSH scanned the META it had R1, D1 and D2.
 So as part of the above code D1 and D2 which where assigned by fixUpDaughters
 is again assigned by 
 {code}
 this.services.getAssignmentManager().assign(e.getKey(), true);
 {code}
 Thus leading to a zookeeper issue due to bad version and killing the master.
 The important part here is the regions that were deleted are recreated which 
 i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK

2012-03-16 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231577#comment-13231577
 ] 

Hudson commented on HBASE-5206:
---

Integrated in HBase-TRUNK #2685 (See 
[https://builds.apache.org/job/HBase-TRUNK/2685/])
HBASE-5206 Port HBASE-5155 to trunk (Ashutosh Jindal) (Revision 1301709)

 Result = SUCCESS
tedyu : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java


 Port HBASE-5155 to 0.92, 0.94, and TRUNK
 

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Zhihong Yu
Assignee: Ashutosh Jindal
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk_1.patch, 
 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 
 5206_trunk_latest_3.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5578) NPE when regionserver reported server load, caused rs stop.

2012-03-16 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231589#comment-13231589
 ] 

Hadoop QA commented on HBASE-5578:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518726/5589.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 162 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.TestKeepDeletes
  org.apache.hadoop.hbase.regionserver.TestMinVersions
  org.apache.hadoop.hbase.regionserver.TestCompaction

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1209//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1209//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1209//console

This message is automatically generated.

 NPE when regionserver reported server load, caused rs stop.
 ---

 Key: HBASE-5578
 URL: https://issues.apache.org/jira/browse/HBASE-5578
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0
 Environment: centos6.2 hadoop-1.0.0 hbase-0.92.0
Reporter: Storm Lee
Priority: Critical
 Fix For: 0.92.2

 Attachments: 5589.txt


 The regeionserver log:
 2012-03-11 11:55:37,808 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 data3,60020,1331286604591: Unhandled exception: null
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.Store.getTotalStaticIndexSize(Store.java:1788)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:994)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:800)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:776)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:678)
   at java.lang.Thread.run(Thread.java:662)
 2012-03-11 11:55:37,808 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: 
 loaded coprocessors are: []
 2012-03-11 11:55:37,808 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: 
 requestsPerSecond=1687, numberOfOnlineRegions=37, numberOfStores=37, 
 numberOfStorefiles=144, storefileIndexSizeMB=2, rootIndexSizeKB=2362, 
 totalStaticIndexSizeKB=229808, totalStaticBloomSizeKB=2166296, 
 memstoreSizeMB=2854, readRequestsCount=1352673, writeRequestsCount=113137586, 
 compactionQueueSize=8, flushQueueSize=3, usedHeapMB=7359, maxHeapMB=12999, 
 blockCacheSizeMB=32.31, blockCacheFreeMB=3867.52, blockCacheCount=38, 
 blockCacheHitCount=87713, blockCacheMissCount=22144560, 
 blockCacheEvictedCount=122, blockCacheHitRatio=0%, 
 blockCacheHitCachingRatio=99%, hdfsBlocksLocalityIndex=100
 2012-03-11 11:55:37,992 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Unhandled 
 exception: null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5549:
---

Status: Open  (was: Patch Available)

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v10.patch, 5549.v11.patch, 5549.v6.patch, 
 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231598#comment-13231598
 ] 

nkeywal commented on HBASE-5549:


v11 with the comments taken into account... Thank you for the review.

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v10.patch, 5549.v11.patch, 5549.v6.patch, 
 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-5549:
---

Attachment: 5549.v11.patch

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v10.patch, 5549.v11.patch, 5549.v6.patch, 
 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5581) Creating a table with invalid syntax does not give an error message when it fails

2012-03-16 Thread Mikhail Bautin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231600#comment-13231600
 ] 

Mikhail Bautin commented on HBASE-5581:
---

Binu: thanks for the patch!
Stack: thanks for committing!


 Creating a table with invalid syntax does not give an error message when it 
 fails
 -

 Key: HBASE-5581
 URL: https://issues.apache.org/jira/browse/HBASE-5581
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: Binu John
Priority: Minor
 Fix For: 0.94.0, 0.96.0

 Attachments: 5581trunk.patch, D2343.1.patch


 Creating a table with invalid syntax does not give an error message when it 
 fails. In this case, it doesn't actually create the CF requested, but doesn't 
 give any indication to the user that it failed.
 create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, NUMREGIONS 
 = 20, SPLITALGO = HexStringSplit, COMPRESSION = 'LZO', BLOOMFILTER = 
 'ROW'}
 0 row(s) in 3.0930 seconds
 hbase(main):002:0 describe 'test'
 DESCRIPTION   
   ENABLED
  {NAME = 'test', FAMILIES = []} 
   true   
 1 row(s) in 0.1430 seconds
 
 Putting {NUMREGIONS = 20, SPLITALGO = HexStringSplit} into a separate 
 stanza works fine, so the feature is fine. 
 create 'test', {NAME = 'test', VERSIONS = 1, BLOCKCACHE = true, 
 COMPRESSION = 'LZO', BLOOMFILTER = 'ROW'}, {NUMREGIONS = 20, SPLITALGO = 
 HexStringSplit}
 0 row(s) in 2.7860 seconds
 hbase(main):002:0 describe 'test'
 DESCRIPTION   
   ENABLED
  {NAME = 'test', FAMILIES = [{NAME = 'test', DATA_BLOCK_ENCODING = 
 'NONE',  true   
  BLOOMFILTER = 'ROW', REPLICATION_SCOPE = '0', COMPRESSION = 'LZO', 
 VERSIONS
   = '1', TTL = '2147483647', BLOCKSIZE = '65536', BLOOMFILTER_ERRORRATE = 
 '
  0.01', ENCODE_ON_DISK = 'true', IN_MEMORY = 'false', BLOCKCACHE = 
 'true'}]}  
 
 We should throw an error if we can't create the CF so it's clear to the user.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5575) Configure Arcanist lint engine for HBase

2012-03-16 Thread Mikhail Bautin (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5575:
--

Attachment: Enabling-lint-2012-03-16_13_40_37.patch

 Configure Arcanist lint engine for HBase
 

 Key: HBASE-5575
 URL: https://issues.apache.org/jira/browse/HBASE-5575
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: Enabling-lint-2012-03-16_13_40_37.patch


 We need to enable Arcanist lint engine in HBase, so that a commit could be 
 checked by running arc lint.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5575) Configure Arcanist lint engine for HBase

2012-03-16 Thread Mikhail Bautin (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231604#comment-13231604
 ] 

Mikhail Bautin commented on HBASE-5575:
---

Reviewed at https://reviews.facebook.net/D2289.

 Configure Arcanist lint engine for HBase
 

 Key: HBASE-5575
 URL: https://issues.apache.org/jira/browse/HBASE-5575
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: Enabling-lint-2012-03-16_13_40_37.patch


 We need to enable Arcanist lint engine in HBase, so that a commit could be 
 checked by running arc lint.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5575) Configure Arcanist lint engine for HBase

2012-03-16 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231613#comment-13231613
 ] 

Phabricator commented on HBASE-5575:


mbautin has committed the revision [jira] [HBASE-5575] Configure Arcanist lint 
engine for HBase.

REVISION DETAIL
  https://reviews.facebook.net/D2289

COMMIT
  https://reviews.facebook.net/rHBASE1301751


 Configure Arcanist lint engine for HBase
 

 Key: HBASE-5575
 URL: https://issues.apache.org/jira/browse/HBASE-5575
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: Enabling-lint-2012-03-16_13_40_37.patch


 We need to enable Arcanist lint engine in HBase, so that a commit could be 
 checked by running arc lint.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK

2012-03-16 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231619#comment-13231619
 ] 

Zhihong Yu commented on HBASE-5206:
---

Integrated to 0.94 as well.

 Port HBASE-5155 to 0.92, 0.94, and TRUNK
 

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Zhihong Yu
Assignee: Ashutosh Jindal
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk_1.patch, 
 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 
 5206_trunk_latest_3.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5206) Port HBASE-5155 to 0.92, 0.94, and TRUNK

2012-03-16 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231636#comment-13231636
 ] 

Hudson commented on HBASE-5206:
---

Integrated in HBase-0.94 #36 (See 
[https://builds.apache.org/job/HBase-0.94/36/])
HBASE-5206 port HBASE-5155 to 0.94 (Ashutosh Jindal) (Revision 1301737)

 Result = SUCCESS
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java


 Port HBASE-5155 to 0.92, 0.94, and TRUNK
 

 Key: HBASE-5206
 URL: https://issues.apache.org/jira/browse/HBASE-5206
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.2, 0.94.0, 0.96.0
Reporter: Zhihong Yu
Assignee: Ashutosh Jindal
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: 5206_92_1.patch, 5206_92_latest_1.patch, 
 5206_92_latest_2.patch, 5206_92_latest_3.patch, 5206_trunk_1.patch, 
 5206_trunk_latest_1.patch, 5206_trunk_latest_2.patch, 
 5206_trunk_latest_3.patch


 This JIRA ports HBASE-5155 (ServerShutDownHandler And Disable/Delete should 
 not happen parallely leading to recreation of regions that were deleted) to 
 0.92 and TRUNK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5155) ServerShutDownHandler And Disable/Delete should not happen parallely leading to recreation of regions that were deleted

2012-03-16 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231637#comment-13231637
 ] 

Hudson commented on HBASE-5155:
---

Integrated in HBase-0.94 #36 (See 
[https://builds.apache.org/job/HBase-0.94/36/])
HBASE-5206 port HBASE-5155 to 0.94 (Ashutosh Jindal) (Revision 1301737)

 Result = SUCCESS
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/TestDrainingServer.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java


 ServerShutDownHandler And Disable/Delete should not happen parallely leading 
 to recreation of regions that were deleted
 ---

 Key: HBASE-5155
 URL: https://issues.apache.org/jira/browse/HBASE-5155
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.90.6

 Attachments: HBASE-5155_1.patch, HBASE-5155_2.patch, 
 HBASE-5155_3.patch, HBASE-5155_latest.patch, hbase-5155_6.patch


 ServerShutDownHandler and disable/delete table handler races.  This is not an 
 issue due to TM.
 - A regionserver goes down.  In our cluster the regionserver holds lot of 
 regions.
 - A region R1 has two daughters D1 and D2.
 - The ServerShutdownHandler gets called and scans the META and gets all the 
 user regions
 - Parallely a table is disabled. (No problem in this step).
 - Delete table is done.
 - The tables and its regions are deleted including R1, D1 and D2.. (So META 
 is cleaned)
 - Now ServerShutdownhandler starts to processTheDeadRegion
 {code}
  if (hri.isOffline()  hri.isSplit()) {
   LOG.debug(Offlined and split region  + hri.getRegionNameAsString() +
 ; checking daughter presence);
   fixupDaughters(result, assignmentManager, catalogTracker);
 {code}
 As part of fixUpDaughters as the daughers D1 and D2 is missing for R1 
 {code}
 if (isDaughterMissing(catalogTracker, daughter)) {
   LOG.info(Fixup; missing daughter  + daughter.getRegionNameAsString());
   MetaEditor.addDaughter(catalogTracker, daughter, null);
   // TODO: Log WARN if the regiondir does not exist in the fs.  If its not
   // there then something wonky about the split -- things will keep going
   // but could be missing references to parent region.
   // And assign it.
   assignmentManager.assign(daughter, true);
 {code}
 we call assign of the daughers.  
 Now after this we again start with the below code.
 {code}
 if (processDeadRegion(e.getKey(), e.getValue(),
 this.services.getAssignmentManager(),
 this.server.getCatalogTracker())) {
   this.services.getAssignmentManager().assign(e.getKey(), true);
 {code}
 Now when the SSH scanned the META it had R1, D1 and D2.
 So as part of the above code D1 and D2 which where assigned by fixUpDaughters
 is again assigned by 
 {code}
 this.services.getAssignmentManager().assign(e.getKey(), true);
 {code}
 Thus leading to a zookeeper issue due to bad version and killing the master.
 The important part here is the regions that were deleted are recreated which 
 i think is more critical.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5521) Move compression/decompression to an encoder specific encoding context

2012-03-16 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231644#comment-13231644
 ] 

Phabricator commented on HBASE-5521:


mbautin has commented on the revision HBASE-5521 [jira] Move 
compression/decompression to an encoder specific encoding context.

  Yongqiang: we now have a linter available in HBase trunk. Could you please 
run arc lint, resolve lint warnings, and resubmit the diff with arc diff 
--preview?

REVISION DETAIL
  https://reviews.facebook.net/D2097


 Move compression/decompression to an encoder specific encoding context
 --

 Key: HBASE-5521
 URL: https://issues.apache.org/jira/browse/HBASE-5521
 Project: HBase
  Issue Type: Improvement
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HBASE-5521.1.patch, HBASE-5521.D2097.1.patch, 
 HBASE-5521.D2097.2.patch, HBASE-5521.D2097.3.patch, HBASE-5521.D2097.4.patch, 
 HBASE-5521.D2097.5.patch, HBASE-5521.D2097.6.patch


 As part of working on HBASE-5313, we want to add a new columnar 
 encoder/decoder. It makes sense to move compression to be part of 
 encoder/decoder:
 1) a scanner for a columnar encoded block can do lazy decompression to a 
 specific part of a key value object
 2) avoid an extra bytes copy from encoder to hblock-writer. 
 If there is no encoder specified for a writer, the HBlock.Writer will use a 
 default compression-context to do something very similar to today's code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231646#comment-13231646
 ] 

Hadoop QA commented on HBASE-5549:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12518730/5549.v11.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 20 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 161 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportTsv
  org.apache.hadoop.hbase.mapred.TestTableMapReduce
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1210//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1210//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1210//console

This message is automatically generated.

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 5549.v10.patch, 5549.v11.patch, 5549.v6.patch, 
 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5549:
--

Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed

Integrated to TRUNK.

Thanks for the patch, N.

Thanks for the review, Stack.

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5549.v10.patch, 5549.v11.patch, 5549.v6.patch, 
 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5549) Master can fail if ZooKeeper session expires

2012-03-16 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5549:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Master can fail if ZooKeeper session expires
 

 Key: HBASE-5549
 URL: https://issues.apache.org/jira/browse/HBASE-5549
 Project: HBase
  Issue Type: Bug
  Components: master, zookeeper
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5549.v10.patch, 5549.v11.patch, 5549.v6.patch, 
 5549.v7.patch, 5549.v8.patch, 5549.v9.patch, nochange.patch


 There is a retry mechanism in RecoverableZooKeeper, but when the session 
 expires, the whole ZooKeeperWatcher is recreated, hence the retry mechanism 
 does not work in this case. This is why a sleep is needed in 
 TestZooKeeper#testMasterSessionExpired: we need to wait for ZooKeeperWatcher 
 to be recreated before using the connection.
 This can happen in real life, it can happen when:
 - master  zookeeper starts
 - zookeeper connection is cut
 - master enters the retry loop
 - in the meantime the session expires
 - the network comes back, the session is recreated
 - the retries continues, but on the wrong object, hence fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5572) KeeperException.SessionExpiredException management could be improved in Master

2012-03-16 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5572:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Resolved as part of HBASE-5549

 KeeperException.SessionExpiredException management could be improved in Master
 --

 Key: HBASE-5572
 URL: https://issues.apache.org/jira/browse/HBASE-5572
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.96.0

 Attachments: 5572.v1.patch, 5572.v2.patch, 5572.v2.patch, 
 5572.v2.patch


 Synthesis:
  1) TestMasterZKSessionRecovery distinguish two cases on 
 SessionExpiredException. One is explicitly not managed. However, is seems 
 that there is no reason for this.
  2) The issue lies in ActiveMasterManager#blockUntilBecomingActiveMaster, a 
 quite complex function, with a useless recursive call.
  3) TestMasterZKSessionRecovery#testMasterZKSessionRecoverySuccess is 
 equivalent to TestZooKeeper#testMasterSessionExpired
  4) TestMasterZKSessionRecovery#testMasterZKSessionRecoveryFailure can be 
 removed if we merge the two cases mentioned above.
 Changes are:
  2) Changing ActiveMasterManager#blockUntilBecomingActiveMaster to have a 
 single case and remove recursion
  1) Removing TestMasterZKSessionRecovery
 Detailed justification:
 testMasterZKSessionRecoveryFailure says:
 {noformat}
   /**
* Negative test of master recovery from zk session expiry.
*
* Starts with one master. Fakes the master zk session expired.
* Ensures the master cannot recover the expired zk session since
* the master zk node is still there.
*/
   public void testMasterZKSessionRecoveryFailure() throws Exception {
 MiniHBaseCluster cluster = TEST_UTIL.getHBaseCluster();
 HMaster m = cluster.getMaster();
 m.abort(Test recovery from zk session expired,
   new KeeperException.SessionExpiredException());
 assertTrue(m.isStopped());
   }
 {noformat}
 This tests works, i.e. the assertion is always verified.
 But do we really want this behavior?
 When looking at the code, we see that this what's happening is strange:
 - HMaster#abort calls Master#abortNow. If HMaster#abortNow returns false 
 HMaster#abort stops the master.
 - HMaster#abortNow checks the exception type. As it's a 
 SessionExpiredException it will try to recover, calling 
 HMaster#tryRecoveringExpiredZKSession. If it cannot, it will return false 
 (and that will make HMaster#abort stopping the master)
 - HMaster#tryRecoveringExpiredZKSession recreates a ZooKeeperConnection and 
 then try to become the active master. If it cannot, it will return false (and 
 that will make HMaster#abort stopping the master).
 - HMaster#becomeActiveMaster returns the result of 
 ActiveMasterManager#blockUntilBecomingActiveMaster. 
 blockUntilBecomingActiveMaster says it will return false if there is any 
 error preventing it to become the active master.
 - ActiveMasterManager#blockUntilBecomingActiveMaster reads ZK for the master 
 address. If it's the same port  host, it deletes the nodes, that will start 
 a recursive call to blockUntilBecomingActiveMaster. This second call succeeds 
 (we became the active master) and return true. This result is ignored by the 
 first blockUntilBecomingActiveMaster: it return false (even if we actually 
 became the active master), hence the whole suite call returns false and 
 HMaster#abort stops the master.
 In other words, the comment says Ensures the master cannot recover the 
 expired zk session since the master zk node is still there. but we're 
 actually doing a check just for this and deleting the node. If we were not 
 ignoring the result, we would return true, so we would not stop the master, 
 so the test would fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >