[jira] [Commented] (HBASE-9461) Some doc and cleanup in RPCServer

2013-09-14 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767422#comment-13767422
 ] 

Nicolas Liochon commented on HBASE-9461:


bq. was going to commit this since it some progress
sure. I was mainly hijacking the jira :-)

bq. The Delay stuff is unused I think. It was an experiment. Maybe I'll look at 
that next and purge it if I can.
It's my impression as well (the code is HBASE-3899). The idea seems very good, 
but if it's not used the ratio complexity vs. usefulness can't be good.

 Some doc and cleanup in RPCServer
 -

 Key: HBASE-9461
 URL: https://issues.apache.org/jira/browse/HBASE-9461
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Attachments: 9461.txt, 9461v2.txt, ipc2.ucls


 RPC is a dog to follow.  I want to do buffer pooling for reading requests but 
 its tough drawing the diagram of who is doing what when.  HBASE-8884 seems to 
 have made it more involved still.  This issue is about doing a bit of 
 untangling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9390) coprocessors observers are not called during a recovery with the new log replay algorithm

2013-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767479#comment-13767479
 ] 

Hudson commented on HBASE-9390:
---

SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #728 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/728/])
hbase-9390: coprocessors observers are not called during a recovery with the 
new log replay algorithm - 1 (jeffreyz: rev 1523172)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverInterface.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java


 coprocessors observers are not called during a recovery with the new log 
 replay algorithm
 -

 Key: HBASE-9390
 URL: https://issues.apache.org/jira/browse/HBASE-9390
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors, MTTR
Affects Versions: 0.95.2
Reporter: Nicolas Liochon
Assignee: Jeffrey Zhong
 Attachments: copro.patch, hbase-9390.patch, hbase-9390-v2.patch


 See the patch to reproduce the issue: If we activate log replay we don't have 
 the events on WAL restore.
 Pinging [~jeffreyz], we discussed this offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9518) getFakedKey() improvement

2013-09-14 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767482#comment-13767482
 ] 

Liang Xie commented on HBASE-9518:
--

Hi [~stack] you can see the new TestKeyValue case:
if the last kv of previous block and the first kv of current block have same 
postfix and just 1 offset diff, e.g.  100abcdefg and 101abcdefg,
before 9518, the getShortMidpointKey() will fallback to the default right kv, 
say 101abcdefg.
after 9518, it'll return 101, a shorter faked value, still reasonable, right? 
:)
And i found this corner case existing in current hbase test cases as well, so 
i'd like to let it go into community codebase also.

 getFakedKey() improvement
 -

 Key: HBASE-9518
 URL: https://issues.apache.org/jira/browse/HBASE-9518
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.98.0, 0.96.1
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-9518.txt, HBASE-9518-v2.txt


 make generating faked key algo more aggressive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9502) HStore.seekToScanner should handle magic value

2013-09-14 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767483#comment-13767483
 ] 

Liang Xie commented on HBASE-9502:
--

Hi [~stack], i thought you need to patch HBASE-9518 first, then run this test, 
and i thought it'll fail w/o the patch.

 HStore.seekToScanner should handle magic value
 --

 Key: HBASE-9502
 URL: https://issues.apache.org/jira/browse/HBASE-9502
 Project: HBase
  Issue Type: Bug
  Components: regionserver, Scanners
Affects Versions: 0.98.0, 0.95.2, 0.96.1
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-9502.txt


 due to faked key, the seekTo probably reture -2, and HStore.seekToScanner 
 should handle this corner case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9519) fix NPE in EncodedScannerV2.getFirstKeyInBlock()

2013-09-14 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767503#comment-13767503
 ] 

Liang Xie commented on HBASE-9519:
--

take it easy, nobody is pissed off :)

could we kick off another QA run manually on build server?

 fix NPE in EncodedScannerV2.getFirstKeyInBlock()
 

 Key: HBASE-9519
 URL: https://issues.apache.org/jira/browse/HBASE-9519
 Project: HBase
  Issue Type: Bug
  Components: HFile
Affects Versions: 0.98.0, 0.96.1
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-9519.txt, HBASE-9519-v2.txt


 we observed a reproducable NPE while scanning special table under special 
 condition in our IntegratedTesting scenario, it was fixed by appling the 
 attached patch.
 org.apache.hadoop.hbase.client.ScannerCallable@67ee75a5, java.io.IOException: 
 java.io.IOException: java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1186)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1175)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2391)
 at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:456)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.getFirstKeyInBlock(HFileReaderV2.java:1071)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:547)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:159)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:142)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader.getLastKey(HalfStoreFileReader.java:267)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Reader.passesKeyRangeFilter(StoreFile.java:1543)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.shouldUseScanner(StoreFileScanner.java:375)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.selectScannersFrom(StoreScanner.java:298)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.getScannersNoCompaction(StoreScanner.java:262)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:149)
 at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:2122)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:3460)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1645)
 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1635)
 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1610)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2377)
 ... 5 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9528) Adaptive compaction

2013-09-14 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767504#comment-13767504
 ] 

Liang Xie commented on HBASE-9528:
--

yeh, both central planning compaction and compaction scheduler seems more 
suitable:)


 Adaptive compaction
 ---

 Key: HBASE-9528
 URL: https://issues.apache.org/jira/browse/HBASE-9528
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 0.98.0
Reporter: Liang Xie

 Currently, the compaction policy granularity is based on single machine. we 
 had a thought that introduce a new cluster granularity decision, such that we 
 could improve those case per cluster running status:
 1) many nodes are compacting aggressive, we call it cluster compaction storm, 
 we should throttle it.
 2) do more compaction if low traffic in current cluster(similar with off-peak 
 feature), not limit by config timerange(like off-peak timerange), just 
 trigger by load or qps or other stuff.
 comments? thanks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9047) Tool to handle finishing replication when the cluster is offline

2013-09-14 Thread Demai Ni (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767521#comment-13767521
 ] 

Demai Ni commented on HBASE-9047:
-

[~jdcryans] 

thank you so much for the review comments and suggestions. I will remove the 
'system.out.println', fix the typo, and remove the 'copyright' line. Also I 
will remove 'conf.setBoolean(HConstants.REPLICATION_ENABLE_KEY, true)', and 
change the testcase per your suggestion.

about the Thread.sleep(3), many thanks for pointing it out. It would be a 
bug in making. I will put some zookeeper checking with a timeout loop.

Let me look into the 'tool' class. I assume to make it a Runnable class, and 
use run() method as the main body is the key here. Thanks for the suggestion.

As for the code style, I am using eclipse and import a 
hbase_eclipse_formatter.xml(http://hbase.apache.org/book/ides.html), but I 
realized that I must miss something from the experience of this and past patch 
submission. Is it the right way to follow? Is there a style checking script 
that I can run before submit? Thanks

Have a nice weekend

Demai 

 Tool to handle finishing replication when the cluster is offline
 

 Key: HBASE-9047
 URL: https://issues.apache.org/jira/browse/HBASE-9047
 Project: HBase
  Issue Type: New Feature
Reporter: Jean-Daniel Cryans
Assignee: Demai Ni
 Attachments: HBASE-9047-0.94.9-v0.PATCH, HBASE-9047-trunk-v0.patch


 We're having a discussion on the mailing list about replicating the data on a 
 cluster that was shut down in an offline fashion. The motivation could be 
 that you don't want to bring HBase back up but still need that data on the 
 slave.
 So I have this idea of a tool that would be running on the master cluster 
 while it is down, although it could also run at any time. Basically it would 
 be able to read the replication state of each master region server, finish 
 replicating what's missing to all the slave, and then clear that state in 
 zookeeper.
 The code that handles replication does most of that already, see 
 ReplicationSourceManager and ReplicationSource. Basically when 
 ReplicationSourceManager.init() is called, it will check all the queues in ZK 
 and try to grab those that aren't attached to a region server. If the whole 
 cluster is down, it will grab all of them.
 The beautiful thing here is that you could start that tool on all your 
 machines and the load will be spread out, but that might not be a big concern 
 if replication wasn't lagging since it would take a few seconds to finish 
 replicating the missing data for each region server.
 I'm guessing when starting ReplicationSourceManager you'd give it a fake 
 region server ID, and you'd tell it not to start its own source.
 FWIW the main difference in how replication is handled between Apache's HBase 
 and Facebook's is that the latter is always done separately of HBase itself. 
 This jira isn't about doing that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9519) fix NPE in EncodedScannerV2.getFirstKeyInBlock()

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767522#comment-13767522
 ] 

stack commented on HBASE-9519:
--

[~xieliang007] I started it here 
https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/7227/ 
but hadoopqa is having some issues; one of its disks is full maybe this 
build will work.

 fix NPE in EncodedScannerV2.getFirstKeyInBlock()
 

 Key: HBASE-9519
 URL: https://issues.apache.org/jira/browse/HBASE-9519
 Project: HBase
  Issue Type: Bug
  Components: HFile
Affects Versions: 0.98.0, 0.96.1
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-9519.txt, HBASE-9519-v2.txt


 we observed a reproducable NPE while scanning special table under special 
 condition in our IntegratedTesting scenario, it was fixed by appling the 
 attached patch.
 org.apache.hadoop.hbase.client.ScannerCallable@67ee75a5, java.io.IOException: 
 java.io.IOException: java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1186)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1175)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2391)
 at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:456)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.getFirstKeyInBlock(HFileReaderV2.java:1071)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:547)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:159)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:142)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader.getLastKey(HalfStoreFileReader.java:267)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Reader.passesKeyRangeFilter(StoreFile.java:1543)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.shouldUseScanner(StoreFileScanner.java:375)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.selectScannersFrom(StoreScanner.java:298)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.getScannersNoCompaction(StoreScanner.java:262)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:149)
 at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:2122)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:3460)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1645)
 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1635)
 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1610)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2377)
 ... 5 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9519) fix NPE in EncodedScannerV2.getFirstKeyInBlock()

2013-09-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767537#comment-13767537
 ] 

Hadoop QA commented on HBASE-9519:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12602951/HBASE-9519-v2.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7227//console

This message is automatically generated.

 fix NPE in EncodedScannerV2.getFirstKeyInBlock()
 

 Key: HBASE-9519
 URL: https://issues.apache.org/jira/browse/HBASE-9519
 Project: HBase
  Issue Type: Bug
  Components: HFile
Affects Versions: 0.98.0, 0.96.1
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-9519.txt, HBASE-9519-v2.txt


 we observed a reproducable NPE while scanning special table under special 
 condition in our IntegratedTesting scenario, it was fixed by appling the 
 attached patch.
 org.apache.hadoop.hbase.client.ScannerCallable@67ee75a5, java.io.IOException: 
 java.io.IOException: java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1186)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1175)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2391)
 at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:456)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.getFirstKeyInBlock(HFileReaderV2.java:1071)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:547)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:159)
 at 
 

[jira] [Resolved] (HBASE-9335) Zombie test detection should filter out non-HBase tests

2013-09-14 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-9335.
---

Resolution: Later

 Zombie test detection should filter out non-HBase tests
 ---

 Key: HBASE-9335
 URL: https://issues.apache.org/jira/browse/HBASE-9335
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu

 Zombie test detection in test-patch.sh sometimes picks up tests from other 
 TLP.
 e.g. from https://builds.apache.org/job/PreCommit-HBASE-Build/6869/console:
 {code}
 main prio=10 tid=0x091b4800 nid=0x7634 waiting on condition [0xf69b1000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hdfs.server.namenode.ha.TestFailoverWithBlockTokensEnabled.TestFailoverAfterAccessKeyUpdate(TestFailoverWithBlockTokensEnabled.java:159)
 {code}
 When the zombie test doesn't belong to org.apache.hadoop.hbase namespace, it 
 shouldn't be listed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9514) Prevent region from assigning before log splitting is done

2013-09-14 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767546#comment-13767546
 ] 

Jimmy Xiang commented on HBASE-9514:


How about a RS is dead but master doesn't know about it yet? So I was thinking 
to control it from the root, AM#assign() method, the final place an openRegion 
request is sent out to another RS.

 Prevent region from assigning before log splitting is done
 --

 Key: HBASE-9514
 URL: https://issues.apache.org/jira/browse/HBASE-9514
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang

 If a region is assigned before log splitting is done by the server shutdown 
 handler, the edits belonging to this region in the hlogs of the dead server 
 will be lost.
 Generally this is not an issue if users don't assign/unassign a region from 
 hbase shell or via hbase admin. These commands are marked for experts only in 
 the hbase shell help too.  However, chaos monkey doesn't care.
 If we can prevent from assigning such regions in a bad time, it would make 
 things a little safer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9523) Audit of hbase-common @InterfaceAudience.Public apis.

2013-09-14 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-9523:
--

Attachment: hbase-9523.patch

 Audit of hbase-common @InterfaceAudience.Public apis.
 -

 Key: HBASE-9523
 URL: https://issues.apache.org/jira/browse/HBASE-9523
 Project: HBase
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 0.95.2
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.98.0, 0.96.0

 Attachments: hbase-9523.patch


 Do an audit of all public classes to make suare we are only publicly exposing 
 what must be exposed.   
 This was done by comparing the Public only version of the javadoc generated 
 by HBASE-9517 to a local javadoc for the hbase-common module (cd 
 hbase-common; mvn javadoc:javadoc).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9523) Audit of hbase-common @InterfaceAudience.Public apis.

2013-09-14 Thread Jonathan Hsieh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-9523:
--

Status: Patch Available  (was: In Progress)

 Audit of hbase-common @InterfaceAudience.Public apis.
 -

 Key: HBASE-9523
 URL: https://issues.apache.org/jira/browse/HBASE-9523
 Project: HBase
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 0.95.2
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.98.0, 0.96.0

 Attachments: hbase-9523.patch


 Do an audit of all public classes to make suare we are only publicly exposing 
 what must be exposed.   
 This was done by comparing the Public only version of the javadoc generated 
 by HBASE-9517 to a local javadoc for the hbase-common module (cd 
 hbase-common; mvn javadoc:javadoc).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9523) Audit of hbase-common @InterfaceAudience.Public apis.

2013-09-14 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767551#comment-13767551
 ] 

Jonathan Hsieh commented on HBASE-9523:
---

The patch attached should have taken into account nick and stack's comments, 
the findings from the hbase-client work, and the previous patch that made 
unmarked elements private.

 Audit of hbase-common @InterfaceAudience.Public apis.
 -

 Key: HBASE-9523
 URL: https://issues.apache.org/jira/browse/HBASE-9523
 Project: HBase
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 0.95.2
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.98.0, 0.96.0

 Attachments: hbase-9523.patch


 Do an audit of all public classes to make suare we are only publicly exposing 
 what must be exposed.   
 This was done by comparing the Public only version of the javadoc generated 
 by HBASE-9517 to a local javadoc for the hbase-common module (cd 
 hbase-common; mvn javadoc:javadoc).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-14 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-9480:
---

   Resolution: Fixed
Fix Version/s: 0.98.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Integrated into 0.96 and trunk. Thanks.

 Regions are unexpectedly made offline in certain failure conditions
 ---

 Key: HBASE-9480
 URL: https://issues.apache.org/jira/browse/HBASE-9480
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Jimmy Xiang
 Fix For: 0.98.0, 0.96.0

 Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
 trunk-9480_v1.2.patch, trunk-9480_v2.patch


 Came across this issue (HBASE-9338 test):
 1. Client issues a request to move a region from ServerA to ServerB
 2. ServerA is compacting that region and doesn't close region immediately. In 
 fact, it takes a while to complete the request.
 3. The master in the meantime, sends another close request.
 4. ServerA sends it a NotServingRegionException
 5. Master handles the exception, deletes the znode, and invokes regionOffline 
 for the said region.
 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
 deleted.
 The region is permanently offline.
 There are potentially other situations where when a RegionServer is offline 
 and the client asks for a region move off from that server, the master makes 
 the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9457) Master could fail start if region server with system table is down

2013-09-14 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-9457:
---

Attachment: trunk-9457_v2.2.patch

Attached v2.2, rebased to trunk latest.

 Master could fail start if region server with system table is down
 --

 Key: HBASE-9457
 URL: https://issues.apache.org/jira/browse/HBASE-9457
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Critical
 Attachments: trunk-9457.patch, trunk-9457_v2.1.patch, 
 trunk-9457_v2.2.patch, trunk-9457_v2.patch


 In the region server holding the system table is killed while master is 
 starting, master will hang there waiting for system table to be assigned 
 which won't happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9523) Audit of hbase-common @InterfaceAudience.Public apis.

2013-09-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767582#comment-13767582
 ] 

Hadoop QA commented on HBASE-9523:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12603201/hbase-9523.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/7228//console

This message is automatically generated.

 Audit of hbase-common @InterfaceAudience.Public apis.
 -

 Key: HBASE-9523
 URL: https://issues.apache.org/jira/browse/HBASE-9523
 Project: HBase
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 0.95.2
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.98.0, 0.96.0

 Attachments: hbase-9523.patch


 Do an audit of all public classes to make suare we are only publicly exposing 
 what must be exposed.   
 This was done by comparing the Public only version of the javadoc generated 
 by HBASE-9517 to a local javadoc for the hbase-common module (cd 
 hbase-common; mvn javadoc:javadoc).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767588#comment-13767588
 ] 

Hudson commented on HBASE-9480:
---

FAILURE: Integrated in hbase-0.96 #48 (See 
[https://builds.apache.org/job/hbase-0.96/48/])
HBASE-9480 Regions are unexpectedly made offline in certain failure conditions 
(jxiang: rev 1523308)
* 
/hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* 
/hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java


 Regions are unexpectedly made offline in certain failure conditions
 ---

 Key: HBASE-9480
 URL: https://issues.apache.org/jira/browse/HBASE-9480
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Jimmy Xiang
 Fix For: 0.98.0, 0.96.0

 Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
 trunk-9480_v1.2.patch, trunk-9480_v2.patch


 Came across this issue (HBASE-9338 test):
 1. Client issues a request to move a region from ServerA to ServerB
 2. ServerA is compacting that region and doesn't close region immediately. In 
 fact, it takes a while to complete the request.
 3. The master in the meantime, sends another close request.
 4. ServerA sends it a NotServingRegionException
 5. Master handles the exception, deletes the znode, and invokes regionOffline 
 for the said region.
 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
 deleted.
 The region is permanently offline.
 There are potentially other situations where when a RegionServer is offline 
 and the client asks for a region move off from that server, the master makes 
 the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9338) Test Big Linked List fails on Hadoop 2.1.0

2013-09-14 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767600#comment-13767600
 ] 

Elliott Clark commented on HBASE-9338:
--

[~enis] did that run have this patch in it?

{code}
13/09/13 03:39:54 INFO actions.Action: Killing region 
server:hor8n10,60020,1379043329928
13/09/13 03:39:56 INFO actions.Action: Killed region 
server:hor8n10,60020,1379043329928. Reported num of rs:8
13/09/13 03:39:56 INFO actions.Action: Sleeping for:6
13/09/13 03:40:05 INFO actions.Action: Performing action: Move random region of 
table IntegrationTestBigLinkedList
{code}

So there's only 11 seconds in between the kill and the move; even though the 
chaos monkey thread should be sleeping for 60 seconds.  After this issue the 
move should always sleep 20 seconds before moving a region.  And it shouldn't 
happen in parallel with a kill.

 Test Big Linked List fails on Hadoop 2.1.0
 --

 Key: HBASE-9338
 URL: https://issues.apache.org/jira/browse/HBASE-9338
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.96.0
Reporter: Elliott Clark
Assignee: Elliott Clark
Priority: Blocker
 Fix For: 0.98.0, 0.96.0

 Attachments: HBASE-9338-0.patch, HBASE-9338-1.patch, 
 HBASE-9338-TESTING-2.patch, HBASE-9338-TESTING-3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767606#comment-13767606
 ] 

Hudson commented on HBASE-9480:
---

SUCCESS: Integrated in HBase-TRUNK #4507 (See 
[https://builds.apache.org/job/HBase-TRUNK/4507/])
HBASE-9480 Regions are unexpectedly made offline in certain failure conditions 
(jxiang: rev 1523303)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java


 Regions are unexpectedly made offline in certain failure conditions
 ---

 Key: HBASE-9480
 URL: https://issues.apache.org/jira/browse/HBASE-9480
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Jimmy Xiang
 Fix For: 0.98.0, 0.96.0

 Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
 trunk-9480_v1.2.patch, trunk-9480_v2.patch


 Came across this issue (HBASE-9338 test):
 1. Client issues a request to move a region from ServerA to ServerB
 2. ServerA is compacting that region and doesn't close region immediately. In 
 fact, it takes a while to complete the request.
 3. The master in the meantime, sends another close request.
 4. ServerA sends it a NotServingRegionException
 5. Master handles the exception, deletes the znode, and invokes regionOffline 
 for the said region.
 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
 deleted.
 The region is permanently offline.
 There are potentially other situations where when a RegionServer is offline 
 and the client asks for a region move off from that server, the master makes 
 the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767614#comment-13767614
 ] 

Hudson commented on HBASE-9480:
---

FAILURE: Integrated in hbase-0.96-hadoop2 #27 (See 
[https://builds.apache.org/job/hbase-0.96-hadoop2/27/])
HBASE-9480 Regions are unexpectedly made offline in certain failure conditions 
(jxiang: rev 1523308)
* 
/hbase/branches/0.96/hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* 
/hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java


 Regions are unexpectedly made offline in certain failure conditions
 ---

 Key: HBASE-9480
 URL: https://issues.apache.org/jira/browse/HBASE-9480
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Jimmy Xiang
 Fix For: 0.98.0, 0.96.0

 Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
 trunk-9480_v1.2.patch, trunk-9480_v2.patch


 Came across this issue (HBASE-9338 test):
 1. Client issues a request to move a region from ServerA to ServerB
 2. ServerA is compacting that region and doesn't close region immediately. In 
 fact, it takes a while to complete the request.
 3. The master in the meantime, sends another close request.
 4. ServerA sends it a NotServingRegionException
 5. Master handles the exception, deletes the znode, and invokes regionOffline 
 for the said region.
 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
 deleted.
 The region is permanently offline.
 There are potentially other situations where when a RegionServer is offline 
 and the client asks for a region move off from that server, the master makes 
 the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767650#comment-13767650
 ] 

Hudson commented on HBASE-9480:
---

SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #729 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/729/])
HBASE-9480 Regions are unexpectedly made offline in certain failure conditions 
(jxiang: rev 1523303)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManagerOnCluster.java


 Regions are unexpectedly made offline in certain failure conditions
 ---

 Key: HBASE-9480
 URL: https://issues.apache.org/jira/browse/HBASE-9480
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Jimmy Xiang
 Fix For: 0.98.0, 0.96.0

 Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
 trunk-9480_v1.2.patch, trunk-9480_v2.patch


 Came across this issue (HBASE-9338 test):
 1. Client issues a request to move a region from ServerA to ServerB
 2. ServerA is compacting that region and doesn't close region immediately. In 
 fact, it takes a while to complete the request.
 3. The master in the meantime, sends another close request.
 4. ServerA sends it a NotServingRegionException
 5. Master handles the exception, deletes the znode, and invokes regionOffline 
 for the said region.
 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
 deleted.
 The region is permanently offline.
 There are potentially other situations where when a RegionServer is offline 
 and the client asks for a region move off from that server, the master makes 
 the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9461) Some doc and cleanup in RPCServer

2013-09-14 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9461:
-

   Resolution: Fixed
Fix Version/s: 0.98.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

 Some doc and cleanup in RPCServer
 -

 Key: HBASE-9461
 URL: https://issues.apache.org/jira/browse/HBASE-9461
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.98.0

 Attachments: 9461.txt, 9461v2.txt, ipc2.ucls


 RPC is a dog to follow.  I want to do buffer pooling for reading requests but 
 its tough drawing the diagram of who is doing what when.  HBASE-8884 seems to 
 have made it more involved still.  This issue is about doing a bit of 
 untangling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9523) Audit of hbase-common @InterfaceAudience.Public apis.

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767655#comment-13767655
 ] 

stack commented on HBASE-9523:
--

lgtm +1

 Audit of hbase-common @InterfaceAudience.Public apis.
 -

 Key: HBASE-9523
 URL: https://issues.apache.org/jira/browse/HBASE-9523
 Project: HBase
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 0.95.2
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.98.0, 0.96.0

 Attachments: hbase-9523.patch


 Do an audit of all public classes to make suare we are only publicly exposing 
 what must be exposed.   
 This was done by comparing the Public only version of the javadoc generated 
 by HBASE-9517 to a local javadoc for the hbase-common module (cd 
 hbase-common; mvn javadoc:javadoc).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9519) fix NPE in EncodedScannerV2.getFirstKeyInBlock()

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767657#comment-13767657
 ] 

stack commented on HBASE-9519:
--

Javadoc warning is from elsewhere.  I don't know this code very well.  Patch 
lgtm w/ this caveat.

 fix NPE in EncodedScannerV2.getFirstKeyInBlock()
 

 Key: HBASE-9519
 URL: https://issues.apache.org/jira/browse/HBASE-9519
 Project: HBase
  Issue Type: Bug
  Components: HFile
Affects Versions: 0.98.0, 0.96.1
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-9519.txt, HBASE-9519-v2.txt


 we observed a reproducable NPE while scanning special table under special 
 condition in our IntegratedTesting scenario, it was fixed by appling the 
 attached patch.
 org.apache.hadoop.hbase.client.ScannerCallable@67ee75a5, java.io.IOException: 
 java.io.IOException: java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1186)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:1175)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2391)
 at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:456)
 at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.getFirstKeyInBlock(HFileReaderV2.java:1071)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:547)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:159)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekBefore(HalfStoreFileReader.java:142)
 at 
 org.apache.hadoop.hbase.io.HalfStoreFileReader.getLastKey(HalfStoreFileReader.java:267)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Reader.passesKeyRangeFilter(StoreFile.java:1543)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.shouldUseScanner(StoreFileScanner.java:375)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.selectScannersFrom(StoreScanner.java:298)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.getScannersNoCompaction(StoreScanner.java:262)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:149)
 at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:2122)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:3460)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1645)
 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1635)
 at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1610)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2377)
 ... 5 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9529) Audit of hbase-client @InterfaceAudience.Public apis

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767663#comment-13767663
 ] 

stack commented on HBASE-9529:
--

Hmm... in '@Public org.apache.hadoop.hbase', if no marking of what follows -- 
e.g. HCD -- means 'public', then +1.  If not, I think HCD, exceptions, etc. 
should be public.

Action is internal.
ConnectionUtils is internal.
Ditto HConnectable
HTableUtil shoudl be private
MultiAction should be private

Below should be private too?  Internal.

MultiResponse

Below are superclasses of Put, etc., so probably public?

Mutation
Operation
OperationWithAttributes


ScanMetrics seems internal
Ditto ColumnInterpreter

RegionState comes out in API?  If not, should be private I'd say.







 Audit of hbase-client @InterfaceAudience.Public apis
 

 Key: HBASE-9529
 URL: https://issues.apache.org/jira/browse/HBASE-9529
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Hsieh
 Fix For: 0.98.0, 0.96.0


 Similar to HBASE-9523, let's do an audit of the hbase-client public api.  
 This is easier to do now that the we can publish only the public api javadoc 
 http://hbase.apache.org/apidocs/  (notice it only has Public apis now!)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9473) Change UI to list 'system tables' rather than 'catalog tables'.

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767664#comment-13767664
 ] 

stack commented on HBASE-9473:
--

[~jdcryans] You going to let me commit this narrow-scoped UI-only patch or you 
want me to do the fix of catalog|system table throughout code base as part of 
this issue (you are a tough taskmaster)

 Change UI to list 'system tables' rather than 'catalog tables'.
 ---

 Key: HBASE-9473
 URL: https://issues.apache.org/jira/browse/HBASE-9473
 Project: HBase
  Issue Type: Bug
  Components: UI
Reporter: stack
Assignee: stack
 Fix For: 0.96.0

 Attachments: 9473.txt


 Minor, one-line, bit of polishing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9425) Starting a LocalHBaseCluster when 2181 is occupied results in Too many open files

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767665#comment-13767665
 ] 

stack commented on HBASE-9425:
--

[~jdcryans] Now e just fail if something on 2181?  I suppose that better than 
current situation.

 Starting a LocalHBaseCluster when 2181 is occupied results in Too many open 
 files
 ---

 Key: HBASE-9425
 URL: https://issues.apache.org/jira/browse/HBASE-9425
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.98.0, 0.96.0

 Attachments: HBASE-9425.patch


 This bug was introduced via HBASE-6677 Random ZooKeeper port in test can 
 overrun max port.
 If 2181 is occupied but you start a LocalHBaseCluster (let's say you untar 
 hbase and start it right away) you'll get this:
 {noformat}
 13/09/03 10:38:13 INFO server.NIOServerCnxnFactory: binding to port 
 0.0.0.0/0.0.0.0:2181
 13/09/03 10:38:13 INFO server.NIOServerCnxnFactory: binding to port 
 0.0.0.0/0.0.0.0:2181
 13/09/03 10:38:13 INFO server.NIOServerCnxnFactory: binding to port 
 0.0.0.0/0.0.0.0:2181
 ...
 13/09/03 10:38:44 INFO server.NIOServerCnxnFactory: binding to port 
 0.0.0.0/0.0.0.0:2181
 13/09/03 10:38:44 INFO server.NIOServerCnxnFactory: binding to port 
 0.0.0.0/0.0.0.0:2181
 13/09/03 10:38:44 ERROR master.HMasterCommandLine: Master exiting
 java.io.IOException: Too many open files
 at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method)
 at sun.nio.ch.EPollArrayWrapper.init(EPollArrayWrapper.java:87)
 at sun.nio.ch.EPollSelectorImpl.init(EPollSelectorImpl.java:68)
 at 
 sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)
 at java.nio.channels.Selector.open(Selector.java:227)
 at 
 org.apache.zookeeper.server.NIOServerCnxnFactory.init(NIOServerCnxnFactory.java:61)
 at 
 org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.startup(MiniZooKeeperCluster.java:165)
 at 
 org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.startup(MiniZooKeeperCluster.java:131)
 at 
 org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:164)
 at 
 org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:134)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:78)
 at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2812)
 {noformat}
 The reason is that MiniZookeeperCluster.selectClientPort returns 2181 if 
 defaultClientPort is greater than 0, which it always is when starting a 
 LocalHBaseCluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-9535) Try a pool of direct byte buffers handling incoming ipc requests

2013-09-14 Thread stack (JIRA)
stack created HBASE-9535:


 Summary: Try a pool of direct byte buffers handling incoming ipc 
requests
 Key: HBASE-9535
 URL: https://issues.apache.org/jira/browse/HBASE-9535
 Project: HBase
  Issue Type: Brainstorming
Reporter: stack
Assignee: stack


ipc takes in a query by allocating a ByteBuffer of the size of the request and 
then reading off the socket into this on-heap BB.

Experiment with keeping a pool of BBs so we have some buffer reuse to cut on 
garbage generated.  Could checkout from pool in RpcServer#Reader.  Could check 
back into the pool when Handler is done just before it queues the response on 
the Responder's queue.  We should be good since, at least for now, kvs get 
copied up into MSLAB (not references) when data gets stuffed into MemStore; 
this should make it so no references left over when we check the BB back into 
the pool for use next time around.

If on-heap BBs work, we could then try direct BBs (Allocation of DBBs takes 
time so if already allocated, should be good.  GC of DBBs is a pain but if in a 
pool, we shouldn't be wanting this to happen).  The copy from socket to the DBB 
will be off-heap (should be fast).

Could start w/ the HDFS DirectBufferPool.  It is unbounded and keeps items by 
size (we might want to bypass the pool if an object is  size N).

DBBs for this task would contend w/ offheap BBs used in BlockReadLocal when 
short-circuit reading.  It'd be a bummer if we had to allocate big objects 
on-heap.  Would still be an improvement.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9529) Audit of hbase-client @InterfaceAudience.Public apis

2013-09-14 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767687#comment-13767687
 ] 

Jonathan Hsieh commented on HBASE-9529:
---


A few updates
- JD says ReplicationAdmin is the only thing that should be public
- Make ClientScannre#getScannerCallable private (it returns a ScannerCallable 
which should be private)



No markings mean currently public and should remain public. 


Will make all of these private:
Action, ConnectionUtils, HConnectable, HTableUtil, MultiAction, MultiResponse, 
ScanMetrics, ColumnInterpreter

These are currently public and will remain public:
Mutation, Operation, OperationWithAttributes

RegionState is exposed by ClusterStatus in {{public 
MapString,org.apache.hadoop.hbase.master.RegionState 
getRegionsInTransition()}}.  We could open RegionState or hide just the 
ClusterStatus#getRegionsInTransition method.  I lean towards keeping it 
exposed. 


 Audit of hbase-client @InterfaceAudience.Public apis
 

 Key: HBASE-9529
 URL: https://issues.apache.org/jira/browse/HBASE-9529
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Hsieh
 Fix For: 0.98.0, 0.96.0


 Similar to HBASE-9523, let's do an audit of the hbase-client public api.  
 This is easier to do now that the we can publish only the public api javadoc 
 http://hbase.apache.org/apidocs/  (notice it only has Public apis now!)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767688#comment-13767688
 ] 

stack commented on HBASE-7525:
--

bq. Yes, it's default behavior is just align with the old one, does the all 
regions monitoring

Ok.  The original behavior is a little 'surprising' but if it has been this way 
up to this, it is fair-enough changing it.

bq. It is the internal DEBUG msg, for counting how many loop of this monitor 
instance did; It can help user to observe the monitor instance's behavior 
whether as expected

I did not understand this log message.  I did not seem to ask for more than one 
loop so seeing more than one w/o asking for it is unexpected.

bq. The option '-regionserver' (regionserver mode) is exclusive with the 
default mode (region mode), which means user can only choose to use default 
mode or regionserver mode either

Understood.  We should fix the usage to make it more plain it exclusive w/ 
table ops:

Usage: ./bin/hbase Canary [opts] [table1 [table2]...] | [regionserver1 
[regionserver2]..]

... or something like that.  As is it would seem to mix the exlusive args.

Your suggestion would allow:

Canary table1 regionserver2 ,etc.

Suggest that in the usage you are more clear that it is table OR regionserver 
ops.

 A canary monitoring program specifically for regionserver
 -

 Key: HBASE-7525
 URL: https://issues.apache.org/jira/browse/HBASE-7525
 Project: HBase
  Issue Type: New Feature
  Components: monitoring
Affects Versions: 0.94.0
Reporter: takeshi.miao
Priority: Critical
 Fix For: 0.98.0

 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, 
 HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, 
 HBASE-7525-trunk-v2.patch, HBASE-7525-v0.patch, RegionServerCanary.java


 *Motivation*
 This ticket is to provide a canary monitoring tool specifically for 
 HRegionserver, details as follows
 1. This tool is required by operation team due to they thought that the 
 canary for each region of a HBase is too many for them, so I implemented this 
 coarse-granular one based on the original o.a.h.h.tool.Canary for them
 2. And this tool is implemented by multi-threading, which means the each Get 
 request sent by a thread. the reason I use this way is due to we suffered the 
 region server hung issue by now the root cause is still not clear. so this 
 tool can help operation team to detect hung region server if any.
 *example*
 1. the tool docs
 ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help
 Usage: [opts] [regionServerName 1 [regionServrName 2...]]
  regionServerName - FQDN serverName, can use linux command:hostname -f to 
 check your serverName
  where [-opts] are:
-help Show this help and exit.
-eUse regionServerName as regular expression
   which means the regionServerName is regular expression pattern
-f B stop whole program if first error occurs, default is true
-t N timeout for a check, default is 60 (milisecs)
-daemonContinuous check at defined intervals.
-interval N  Interval between checks (sec)
 2. Will send a request to each regionserver in a HBase cluster
 ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary
 3. Will send a request to a regionserver by given name
 ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname
 4. Will send a request to regionserver(s) by given regular-expression
 /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e 
 rs1.domainname.pattern
 // another example
 ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e 
 tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org
 5. Will send a request to a regionserver and also set a timeout limit for 
 this test
 // query regionserver:rs1.domainname with timeout limit 10sec
 // -f false, means that will not exit this program even test failed
 ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 
 rs1.domainname
 // echo 1 if timeout
 echo $?
 6. Will run as daemon mode, which means it will send request to each 
 regionserver periodically
 ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8810) Bring in code constants in line with default xml's

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767690#comment-13767690
 ] 

stack commented on HBASE-8810:
--

Purging the unused would be coolio and fixing up the mismatches would help too.

 Bring in code constants in line with default xml's
 --

 Key: HBASE-8810
 URL: https://issues.apache.org/jira/browse/HBASE-8810
 Project: HBase
  Issue Type: Bug
  Components: Usability
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: 8810.txt, 8810v2.txt, 
 hbase-default_to_java_constants.xsl, HBaseDefaultXMLConstants.java


 After the defaults were changed in the xml some constants were left the same.
 DEFAULT_HBASE_CLIENT_PAUSE for example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6581) Build with hadoop.profile=3.0

2013-09-14 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6581:
-

 Priority: Critical  (was: Major)
Fix Version/s: 0.98.0

Making critical 0.98.  Could come into 0.96 too.  Just needs someone taking it 
for a run on cluster making sure it basically works.

 Build with hadoop.profile=3.0
 -

 Key: HBASE-6581
 URL: https://issues.apache.org/jira/browse/HBASE-6581
 Project: HBase
  Issue Type: Bug
Reporter: Eric Charles
Assignee: Eric Charles
Priority: Critical
 Fix For: 0.98.0

 Attachments: HBASE-6581-1.patch, HBASE-6581-20130821.patch, 
 HBASE-6581-2.patch, HBASE-6581-3.patch, HBASE-6581-4.patch, 
 HBASE-6581-5.patch, HBASE-6581.diff, HBASE-6581.diff


 Building trunk with hadoop.profile=3.0 gives exceptions (see [1]) due to 
 change in the hadoop maven modules naming (and also usage of 3.0-SNAPSHOT 
 instead of 3.0.0-SNAPSHOT in hbase-common).
 I can provide a patch that would move most of hadoop dependencies in their 
 respective profiles and will define the correct hadoop deps in the 3.0 
 profile.
 Please tell me if that's ok to go this way.
 Thx, Eric
 [1]
 $ mvn clean install -Dhadoop.profile=3.0
 [INFO] Scanning for projects...
 [ERROR] The build could not read 3 projects - [Help 1]
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-server:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-server/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 655, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 659, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 663, column 21
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-common:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-common/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 170, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 174, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 178, column 21
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-it:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-it/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 220, column 18
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 224, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 228, column 21
 [ERROR] 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HBASE-9529) Audit of hbase-client @InterfaceAudience.Public apis

2013-09-14 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767687#comment-13767687
 ] 

Jonathan Hsieh edited comment on HBASE-9529 at 9/15/13 5:23 AM:


A few updates
- JD says ReplicationAdmin is the only thing that should be public in the 
replication client packages.
- Make ClientScannre#getScannerCallable private (it returns a ScannerCallable 
which should be private)



No markings mean currently public and should remain public. 


Will make all of these private:
Action, ConnectionUtils, HConnectable, HTableUtil, MultiAction, MultiResponse, 
ScanMetrics, ColumnInterpreter

These are currently public and will remain public:
Mutation, Operation, OperationWithAttributes

RegionState is exposed by ClusterStatus in {{public 
MapString,org.apache.hadoop.hbase.master.RegionState 
getRegionsInTransition()}}.  We could open RegionState or hide just the 
ClusterStatus#getRegionsInTransition method.  I lean towards keeping it 
exposed. 


  was (Author: jmhsieh):

A few updates
- JD says ReplicationAdmin is the only thing that should be public
- Make ClientScannre#getScannerCallable private (it returns a ScannerCallable 
which should be private)



No markings mean currently public and should remain public. 


Will make all of these private:
Action, ConnectionUtils, HConnectable, HTableUtil, MultiAction, MultiResponse, 
ScanMetrics, ColumnInterpreter

These are currently public and will remain public:
Mutation, Operation, OperationWithAttributes

RegionState is exposed by ClusterStatus in {{public 
MapString,org.apache.hadoop.hbase.master.RegionState 
getRegionsInTransition()}}.  We could open RegionState or hide just the 
ClusterStatus#getRegionsInTransition method.  I lean towards keeping it 
exposed. 

  
 Audit of hbase-client @InterfaceAudience.Public apis
 

 Key: HBASE-9529
 URL: https://issues.apache.org/jira/browse/HBASE-9529
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Hsieh
 Fix For: 0.98.0, 0.96.0


 Similar to HBASE-9523, let's do an audit of the hbase-client public api.  
 This is easier to do now that the we can publish only the public api javadoc 
 http://hbase.apache.org/apidocs/  (notice it only has Public apis now!)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8143) HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767692#comment-13767692
 ] 

stack commented on HBASE-8143:
--

There is no way of getting a direct byte buffer w/o it being counted against 
the commit charge for the process?  Its a pity given we are just doing 
read-only.

All of this off-heap allocation will impinge in our being able to use off heap 
for other purposes.

 HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM 
 --

 Key: HBASE-8143
 URL: https://issues.apache.org/jira/browse/HBASE-8143
 Project: HBase
  Issue Type: Bug
  Components: hadoop2
Affects Versions: 0.98.0, 0.94.7, 0.95.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.0, 0.94.13

 Attachments: OpenFileTest.java


 We've run into an issue with HBase 0.94 on Hadoop2, with SSR turned on that 
 the memory usage of the HBase process grows to 7g, on an -Xmx3g, after some 
 time, this causes OOM for the RSs. 
 Upon further investigation, I've found out that we end up with 200 regions, 
 each having 3-4 store files open. Under hadoop2 SSR, BlockReaderLocal 
 allocates DirectBuffers, which is unlike HDFS 1 where there is no direct 
 buffer allocation. 
 It seems that there is no guards against the memory used by local buffers in 
 hdfs 2, and having a large number of open files causes multiple GB of memory 
 to be consumed from the RS process. 
 This issue is to further investigate what is going on. Whether we can limit 
 the memory usage in HDFS, or HBase, and/or document the setup. 
 Possible mitigation scenarios are: 
  - Turn off SSR for Hadoop 2
  - Ensure that there is enough unallocated memory for the RS based on 
 expected # of store files
  - Ensure that there is lower number of regions per region server (hence 
 number of open files)
 Stack trace:
 {code}
 org.apache.hadoop.hbase.DroppedSnapshotException: region: 
 IntegrationTestLoadAndVerify,yC^P\xD7\x945\xD4,1363388517630.24655343d8d356ef708732f34cfe8946.
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1560)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1439)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1380)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:449)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:215)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:63)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:632)
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:97)
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
 at 
 org.apache.hadoop.hdfs.util.DirectBufferPool.getBuffer(DirectBufferPool.java:70)
 at 
 org.apache.hadoop.hdfs.BlockReaderLocal.init(BlockReaderLocal.java:315)
 at 
 org.apache.hadoop.hdfs.BlockReaderLocal.newBlockReader(BlockReaderLocal.java:208)
 at 
 org.apache.hadoop.hdfs.DFSClient.getLocalBlockReader(DFSClient.java:790)
 at 
 org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:888)
 at 
 org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:455)
 at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:645)
 at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:689)
 at java.io.DataInputStream.readFully(DataInputStream.java:178)
 at 
 org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:312)
 at 
 org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:543)
 at 
 org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:589)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Reader.init(StoreFile.java:1261)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:512)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:603)
 at 
 org.apache.hadoop.hbase.regionserver.Store.validateStoreFile(Store.java:1568)
 at 
 org.apache.hadoop.hbase.regionserver.Store.commitFile(Store.java:845)
 at 
 org.apache.hadoop.hbase.regionserver.Store.access$500(Store.java:109)
 at 
 

[jira] [Commented] (HBASE-8143) HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM

2013-09-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767693#comment-13767693
 ] 

Lars Hofhansl commented on HBASE-8143:
--

With a reasonable buffer size it should be OK. 1mb is clearly counter 
productive.
It's on my (long) list of things to test with a really smaller buffer size 
(like 4 or 8k) and see the impact of that.

At work we have this set to 128k and that has been working well.

 HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM 
 --

 Key: HBASE-8143
 URL: https://issues.apache.org/jira/browse/HBASE-8143
 Project: HBase
  Issue Type: Bug
  Components: hadoop2
Affects Versions: 0.98.0, 0.94.7, 0.95.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.0, 0.94.13

 Attachments: OpenFileTest.java


 We've run into an issue with HBase 0.94 on Hadoop2, with SSR turned on that 
 the memory usage of the HBase process grows to 7g, on an -Xmx3g, after some 
 time, this causes OOM for the RSs. 
 Upon further investigation, I've found out that we end up with 200 regions, 
 each having 3-4 store files open. Under hadoop2 SSR, BlockReaderLocal 
 allocates DirectBuffers, which is unlike HDFS 1 where there is no direct 
 buffer allocation. 
 It seems that there is no guards against the memory used by local buffers in 
 hdfs 2, and having a large number of open files causes multiple GB of memory 
 to be consumed from the RS process. 
 This issue is to further investigate what is going on. Whether we can limit 
 the memory usage in HDFS, or HBase, and/or document the setup. 
 Possible mitigation scenarios are: 
  - Turn off SSR for Hadoop 2
  - Ensure that there is enough unallocated memory for the RS based on 
 expected # of store files
  - Ensure that there is lower number of regions per region server (hence 
 number of open files)
 Stack trace:
 {code}
 org.apache.hadoop.hbase.DroppedSnapshotException: region: 
 IntegrationTestLoadAndVerify,yC^P\xD7\x945\xD4,1363388517630.24655343d8d356ef708732f34cfe8946.
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1560)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1439)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1380)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:449)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:215)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:63)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:632)
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:97)
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
 at 
 org.apache.hadoop.hdfs.util.DirectBufferPool.getBuffer(DirectBufferPool.java:70)
 at 
 org.apache.hadoop.hdfs.BlockReaderLocal.init(BlockReaderLocal.java:315)
 at 
 org.apache.hadoop.hdfs.BlockReaderLocal.newBlockReader(BlockReaderLocal.java:208)
 at 
 org.apache.hadoop.hdfs.DFSClient.getLocalBlockReader(DFSClient.java:790)
 at 
 org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:888)
 at 
 org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:455)
 at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:645)
 at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:689)
 at java.io.DataInputStream.readFully(DataInputStream.java:178)
 at 
 org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:312)
 at 
 org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:543)
 at 
 org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:589)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Reader.init(StoreFile.java:1261)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:512)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:603)
 at 
 org.apache.hadoop.hbase.regionserver.Store.validateStoreFile(Store.java:1568)
 at 
 org.apache.hadoop.hbase.regionserver.Store.commitFile(Store.java:845)
 at 
 org.apache.hadoop.hbase.regionserver.Store.access$500(Store.java:109)
 at 
 

[jira] [Created] (HBASE-9536) Fix minor javadoc warnings

2013-09-14 Thread stack (JIRA)
stack created HBASE-9536:


 Summary: Fix minor javadoc warnings
 Key: HBASE-9536
 URL: https://issues.apache.org/jira/browse/HBASE-9536
 Project: HBase
  Issue Type: Bug
Reporter: stack




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9536) Fix minor javadoc warnings

2013-09-14 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9536:
-

Attachment: 9536.txt

A few warnings on trunk.

 Fix minor javadoc warnings
 --

 Key: HBASE-9536
 URL: https://issues.apache.org/jira/browse/HBASE-9536
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Attachments: 9536.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9536) Fix minor javadoc warnings

2013-09-14 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-9536:
-

  Component/s: documentation
  Description: I applied the trunk patch.  Let me check 0.96 for warnings 
too.
Fix Version/s: 0.98.0
 Assignee: stack

 Fix minor javadoc warnings
 --

 Key: HBASE-9536
 URL: https://issues.apache.org/jira/browse/HBASE-9536
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: stack
Assignee: stack
 Fix For: 0.98.0

 Attachments: 9536.txt


 I applied the trunk patch.  Let me check 0.96 for warnings too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9480) Regions are unexpectedly made offline in certain failure conditions

2013-09-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767696#comment-13767696
 ] 

Lars Hofhansl commented on HBASE-9480:
--

[~jxiang], is this an issue in 0.94 as well?

 Regions are unexpectedly made offline in certain failure conditions
 ---

 Key: HBASE-9480
 URL: https://issues.apache.org/jira/browse/HBASE-9480
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Jimmy Xiang
 Fix For: 0.98.0, 0.96.0

 Attachments: 9480-1.txt, trunk-9480.patch, trunk-9480_v1.1.patch, 
 trunk-9480_v1.2.patch, trunk-9480_v2.patch


 Came across this issue (HBASE-9338 test):
 1. Client issues a request to move a region from ServerA to ServerB
 2. ServerA is compacting that region and doesn't close region immediately. In 
 fact, it takes a while to complete the request.
 3. The master in the meantime, sends another close request.
 4. ServerA sends it a NotServingRegionException
 5. Master handles the exception, deletes the znode, and invokes regionOffline 
 for the said region.
 6. ServerA fails to operate on ZK in the CloseRegionHandler since the node is 
 deleted.
 The region is permanently offline.
 There are potentially other situations where when a RegionServer is offline 
 and the client asks for a region move off from that server, the master makes 
 the region offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9461) Some doc and cleanup in RPCServer

2013-09-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767697#comment-13767697
 ] 

Hudson commented on HBASE-9461:
---

SUCCESS: Integrated in HBase-TRUNK #4508 (See 
[https://builds.apache.org/job/HBase-TRUNK/4508/])
HBASE-9461 Some doc and cleanup in RPCServer (stack: rev 1523386)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/CallRunner.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/FifoRpcScheduler.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RequestContext.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcScheduler.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcSchedulerContext.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServerInterface.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/SimpleRpcScheduler.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestCallRunner.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestIPC.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestSimpleRpcScheduler.java


 Some doc and cleanup in RPCServer
 -

 Key: HBASE-9461
 URL: https://issues.apache.org/jira/browse/HBASE-9461
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.98.0

 Attachments: 9461.txt, 9461v2.txt, ipc2.ucls


 RPC is a dog to follow.  I want to do buffer pooling for reading requests but 
 its tough drawing the diagram of who is doing what when.  HBASE-8884 seems to 
 have made it more involved still.  This issue is about doing a bit of 
 untangling.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8143) HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM

2013-09-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767699#comment-13767699
 ] 

stack commented on HBASE-8143:
--

Just saying we will have to balance this sizing amongst the different needs.  
4k or 8k might work for the local block reader but might not be appropriate for 
something like HBASE-9535 (or any other feature we'd want to do off-heap).

 HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM 
 --

 Key: HBASE-8143
 URL: https://issues.apache.org/jira/browse/HBASE-8143
 Project: HBase
  Issue Type: Bug
  Components: hadoop2
Affects Versions: 0.98.0, 0.94.7, 0.95.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.98.0, 0.96.0, 0.94.13

 Attachments: OpenFileTest.java


 We've run into an issue with HBase 0.94 on Hadoop2, with SSR turned on that 
 the memory usage of the HBase process grows to 7g, on an -Xmx3g, after some 
 time, this causes OOM for the RSs. 
 Upon further investigation, I've found out that we end up with 200 regions, 
 each having 3-4 store files open. Under hadoop2 SSR, BlockReaderLocal 
 allocates DirectBuffers, which is unlike HDFS 1 where there is no direct 
 buffer allocation. 
 It seems that there is no guards against the memory used by local buffers in 
 hdfs 2, and having a large number of open files causes multiple GB of memory 
 to be consumed from the RS process. 
 This issue is to further investigate what is going on. Whether we can limit 
 the memory usage in HDFS, or HBase, and/or document the setup. 
 Possible mitigation scenarios are: 
  - Turn off SSR for Hadoop 2
  - Ensure that there is enough unallocated memory for the RS based on 
 expected # of store files
  - Ensure that there is lower number of regions per region server (hence 
 number of open files)
 Stack trace:
 {code}
 org.apache.hadoop.hbase.DroppedSnapshotException: region: 
 IntegrationTestLoadAndVerify,yC^P\xD7\x945\xD4,1363388517630.24655343d8d356ef708732f34cfe8946.
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1560)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1439)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1380)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:449)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:215)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:63)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.lang.OutOfMemoryError: Direct buffer memory
 at java.nio.Bits.reserveMemory(Bits.java:632)
 at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:97)
 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
 at 
 org.apache.hadoop.hdfs.util.DirectBufferPool.getBuffer(DirectBufferPool.java:70)
 at 
 org.apache.hadoop.hdfs.BlockReaderLocal.init(BlockReaderLocal.java:315)
 at 
 org.apache.hadoop.hdfs.BlockReaderLocal.newBlockReader(BlockReaderLocal.java:208)
 at 
 org.apache.hadoop.hdfs.DFSClient.getLocalBlockReader(DFSClient.java:790)
 at 
 org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:888)
 at 
 org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:455)
 at 
 org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:645)
 at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:689)
 at java.io.DataInputStream.readFully(DataInputStream.java:178)
 at 
 org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:312)
 at 
 org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:543)
 at 
 org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:589)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile$Reader.init(StoreFile.java:1261)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:512)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:603)
 at 
 org.apache.hadoop.hbase.regionserver.Store.validateStoreFile(Store.java:1568)
 at 
 org.apache.hadoop.hbase.regionserver.Store.commitFile(Store.java:845)
 at 
 org.apache.hadoop.hbase.regionserver.Store.access$500(Store.java:109)
 at 
 org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.commit(Store.java:2209)
 at