date:20120326

[
https://issues.apache.org/jira/browse/HBASE-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238506#comment-13238506
]

Hadoop QA commented on HBASE-5564:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12519960/HBASE-5564_trunk.2.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.mapreduce.TestImportTsv
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1305//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1305//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1305//console

This message is automatically generated.

Bulkload is discarding duplicate records

Key: HBASE-5564
URL: https://issues.apache.org/jira/browse/HBASE-5564
Project: HBase
Issue Type: Bug
Components: mapreduce
Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0
Environment: HBase 0.92
Reporter: Laxman
Assignee: Laxman
Labels: bulkloader
Fix For: 0.96.0

Attachments: 5564.lint, HBASE-5564_trunk.1.patch,
HBASE-5564_trunk.1.patch, HBASE-5564_trunk.2.patch, HBASE-5564_trunk.patch

Duplicate records are getting discarded when duplicate records exists in same
input file and more specifically if they exists in same split.
Duplicate records are considered if the records are from diffrent different
splits.
Version under test: HBase 0.92

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-03-26 Thread Alex Newman (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238513#comment-13238513
 ] 

Alex Newman commented on HBASE-2600:


I really have no idea what's going on here. I can't seem to create patch from 
svn or git. Also, I've noticed the patch does have the binary snippet, svn and 
patch just aren't applying it. My jenkins job runs out of memory(/dev/shm). So 
I gave the build machine a reboot and the branch at 
http://github.com/posix4e/hbase (branch jenkins) built fine. Can someone with a 
commit bit just pull it into a svn branch?

 Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
 tablename+ENDROW+randomid
 

 Key: HBASE-2600
 URL: https://issues.apache.org/jira/browse/HBASE-2600
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Alex Newman
 Attachments: 
 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v4.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v6.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8.1, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v9.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch, 
 2600-trunk-01-17.txt, HBASE-2600+5217-Sun-Mar-25-2012-v3.patch, 
 HBASE-2600+5217-Sun-Mar-25-2012-v4.patch, jenkins.pdf


 This is an idea that Ryan and I have been kicking around on and off for a 
 while now.
 If regionnames were made of tablename+endrow instead of tablename+startrow, 
 then in the metatables, doing a search for the region that contains the 
 wanted row, we'd just have to open a scanner using passed row and the first 
 row found by the scan would be that of the region we need (If offlined 
 parent, we'd have to scan to the next row).
 If we redid the meta tables in this format, we'd be using an access that is 
 natural to hbase, a scan as opposed to the perverse, expensive 
 getClosestRowBefore we currently have that has to walk backward in meta 
 finding a containing region.
 This issue is about changing the way we name regions.
 If we were using scans, prewarming client cache would be near costless (as 
 opposed to what we'll currently have to do which is first a 
 getClosestRowBefore and then a scan from the closestrowbefore forward).
 Converting to the new method, we'd have to run a migration on startup 
 changing the content in meta.
 Up to this, the randomid component of a region name has been the timestamp of 
 region creation.   HBASE-2531 32-bit encoding of regionnames waaay 
 too susceptible to hash clashes proposes changing the randomid so that it 
 contains actual name of the directory in the filesystem that hosts the 
 region.  If we had this in place, I think it would help with the migration to 
 this new way of doing the meta because as is, the region name in fs is a hash 
 of regionname... changing the format of the regionname would mean we generate 
 a different hash... so we'd need hbase-2531 to be in place before we could do 
 this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5097) RegionObserver implementation whose preScannerOpen and postScannerOpen Impl return null can stall the system initialization through NPE


[ 
https://issues.apache.org/jira/browse/HBASE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238532#comment-13238532
 ] 

Lars Hofhansl commented on HBASE-5097:
--

Feel free to put it in 0.94.0. Will cut an RC today.

 RegionObserver implementation whose preScannerOpen and postScannerOpen Impl 
 return null can stall the system initialization through NPE
 ---

 Key: HBASE-5097
 URL: https://issues.apache.org/jira/browse/HBASE-5097
 Project: HBase
  Issue Type: Bug
  Components: coprocessors
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-5097.patch, HBASE-5097_1.patch, HBASE-5097_2.patch


 In HRegionServer.java openScanner()
 {code}
   r.prepareScanner(scan);
   RegionScanner s = null;
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().preScannerOpen(scan);
   }
   if (s == null) {
 s = r.getScanner(scan);
   }
   if (r.getCoprocessorHost() != null) {
 s = r.getCoprocessorHost().postScannerOpen(scan, s);
   }
 {code}
 If we dont have implemention for postScannerOpen the RegionScanner is null 
 and so throwing nullpointer 
 {code}
 java.lang.NullPointerException
   at 
 java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.addScanner(HRegionServer.java:2282)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2272)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1326)
 {code}
 Making this defect as blocker.. Pls feel free to change the priority if am 
 wrong.  Also correct me if my way of trying out coprocessors without 
 implementing postScannerOpen is wrong.  Am just a learner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush


[ 
https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238534#comment-13238534
 ] 

Lars Hofhansl commented on HBASE-5623:
--

@Enis: Please have a look when you get a chance. I would like to cut an RC of 
0.94 today.

 Race condition when rolling the HLog and hlogFlush
 --

 Key: HBASE-5623
 URL: https://issues.apache.org/jira/browse/HBASE-5623
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 
 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, 
 HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch


 When doing a ycsb test with a large number of handlers 
 (regionserver.handler.count=60), I get the following exceptions:
 {code}
 Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152)
   at $Proxy1.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214)
 {code}
 and 
 {code}
   java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
 {code}
 It seems the root cause of the issue is that we open a new log writer and 
 close the old one at HLog#rollWriter() holding the updateLock, but the other 
 threads doing syncer() calls
 {code} 
 logSyncerThread.hlogFlush(this.writer);
 {code}
 without holding the updateLock. LogSyncer only synchronizes against 
 concurrent appends and flush(), but not on the passed writer, which can be 
 closed already by rollWriter(). In this case, since 
 SequenceFile#Writer.close() sets it's out field as null, we get the NPE. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA

[jira] [Updated] (HBASE-5633) NPE reading ZK config in HBase


 [ 
https://issues.apache.org/jira/browse/HBASE-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5633:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Let's move the 0.90 and 0.92 into a sub-task, so that this issue can be kept 
closed.

 NPE reading ZK config in HBase
 --

 Key: HBASE-5633
 URL: https://issues.apache.org/jira/browse/HBASE-5633
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Reporter: Matteo Bertozzi
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-5633-0.90.patch, HBASE-5633-0.92.patch, 
 HBASE-5633-v1.patch, HBASE-5633-v2.patch


 If zoo.cfg contains server.* (server.0=server0:2888:3888\n) and 
 cluster.distributed property (in hbase-site.xml) is empty we get an NPE in 
 parseZooCfg().
 The easy way to reproduce the bug is running 
 org.apache.hbase.zookeeper.TestHQuorumPeer with hbase-site.xml containing:
 {code}
 property
   namehbase.cluster.distributed/name
   value/value
 /property
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5615) the master never does balance because of balancing the parent region


 [ 
https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5615:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 the master never does balance because of balancing the parent region
 

 Key: HBASE-5615
 URL: https://issues.apache.org/jira/browse/HBASE-5615
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.7
Reporter: xufeng
Assignee: xufeng
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, 
 NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html


 the master never do balance becauseof when master do rebuildUserRegions()，it 
 will add the parent region into  AssignmentManager#servers,
 if balancer let the parent region to move,the parent will in RIT forever.thus 
 balance will never be executed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5636) TestTableMapReduce doesn't work properly.

2012-03-26 Thread Takuya Ueshin (Created) (JIRA)

TestTableMapReduce doesn't work properly.
-

 Key: HBASE-5636
 URL: https://issues.apache.org/jira/browse/HBASE-5636
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.92.1
Reporter: Takuya Ueshin


No map function is called because there are no test data put before test starts.

The following three tests are in the same situation:
- org.apache.hadoop.hbase.mapred.TestTableMapReduce
- org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
- org.apache.hadoop.hbase.mapreduce.TestMulitthreadedTableMapper


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5637) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.


 [ 
https://issues.apache.org/jira/browse/HBASE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5637:
--

Attachment: hbase-5637.patch

 Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.
 

 Key: HBASE-5637
 URL: https://issues.apache.org/jira/browse/HBASE-5637
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.7
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: hbase-5637.patch


 After committing HBASE-5128 and HBASE-5563 to the 0.90 branch, Ted noticed 
 that TestHMsg#getList started to fail consistently.  This updates the test to 
 deal with the updated equality semantics in HBASE-5563.  This fixes that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5637) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.

2012-03-26 Thread Jonathan Hsieh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238565#comment-13238565
 ] 

Jonathan Hsieh commented on HBASE-5637:
---

This test case does not exist in 0.92/0.94/trunk and is not relevent to it.

 Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.
 

 Key: HBASE-5637
 URL: https://issues.apache.org/jira/browse/HBASE-5637
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.7
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: hbase-5637.patch


 After committing HBASE-5128 and HBASE-5563 to the 0.90 branch, Ted noticed 
 that TestHMsg#getList started to fail consistently.  This updates the test to 
 deal with the updated equality semantics in HBASE-5563.  This fixes that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5637) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.


 [ 
https://issues.apache.org/jira/browse/HBASE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5637:
--

Status: Patch Available  (was: Open)

 Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.
 

 Key: HBASE-5637
 URL: https://issues.apache.org/jira/browse/HBASE-5637
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.7
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: hbase-5637.patch


 After committing HBASE-5128 and HBASE-5563 to the 0.90 branch, Ted noticed 
 that TestHMsg#getList started to fail consistently.  This updates the test to 
 deal with the updated equality semantics in HBASE-5563.  This fixes that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5637) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.


[ 
https://issues.apache.org/jira/browse/HBASE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238568#comment-13238568
 ] 

stack commented on HBASE-5637:
--

+1

 Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.
 

 Key: HBASE-5637
 URL: https://issues.apache.org/jira/browse/HBASE-5637
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.7
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: hbase-5637.patch


 After committing HBASE-5128 and HBASE-5563 to the 0.90 branch, Ted noticed 
 that TestHMsg#getList started to fail consistently.  This updates the test to 
 deal with the updated equality semantics in HBASE-5563.  This fixes that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5637) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.

2012-03-26 Thread David S. Wang (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238571#comment-13238571
 ] 

David S. Wang commented on HBASE-5637:
--

This looks good to me.

 Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.
 

 Key: HBASE-5637
 URL: https://issues.apache.org/jira/browse/HBASE-5637
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.7
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: hbase-5637.patch


 After committing HBASE-5128 and HBASE-5563 to the 0.90 branch, Ted noticed 
 that TestHMsg#getList started to fail consistently.  This updates the test to 
 deal with the updated equality semantics in HBASE-5563.  This fixes that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-3813) Change RPC callQueue size from handlerCount * MAX_QUEUE_SIZE_PER_HANDLER;

2012-03-26 Thread Jean-Daniel Cryans (Assigned) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jean-Daniel Cryans reassigned HBASE-3813:
-

Assignee: stack

Change RPC callQueue size from handlerCount * MAX_QUEUE_SIZE_PER_HANDLER;
---

Key: HBASE-3813
URL: https://issues.apache.org/jira/browse/HBASE-3813
Project: HBase
Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
Priority: Critical
Fix For: 0.92.0

Attachments: 3813.txt

Yesterday debugging w/ Jack we noticed that with few handlers on a big box,
he was seeing stats like this:
{code}
2011-04-21 11:54:49,451 DEBUG org.apache.hadoop.ipc.HBaseServer: Server
connection from X.X.X.X:60931; # active connections: 11; # queued calls: 2500
{code}
We had 2500 items in the rpc queue waiting to be processed.
Turns out he had too few handlers for number of clients (but also, it seems
like he figured hw issues in that his RAM bus was running at 1/4 the rate
that it should have been running at).
Chatting w/ J-D this morning, he asked if the queues hold 'data'. The queues
hold 'Calls'. Calls are the client request. They contain data.
Jack had 2500 items queued. If each item to insert was 1MB, thats 25k * 1MB
of memory that is outside of our generally accounting.
Currently the queue size is handlers * MAX_QUEUE_SIZE_PER_HANDLER where
MAX_QUEUE_SIZE_PER_HANDLER is hardcoded to be 100.
If the queue is full we block (LinkedBlockingQueue).
Going to change the queue size from 100 to 10 by default -- but also will
make it configurable and will doc. this as possible cause of OOME. Will try
it on production here before committing patch.

[jira] [Commented] (HBASE-5596) Few minor bugs from HBASE-5209

2012-03-26 Thread Jonathan Hsieh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238591#comment-13238591
 ] 

Jonathan Hsieh commented on HBASE-5596:
---

+1, lgtm.  Since this is an addendum to HBASE-5209, I've tweaked it to work on 
0.94 and 0.92.  I'll run them through test suites and if they pass tests I'll 
commit.  



 Few minor bugs from HBASE-5209
 --

 Key: HBASE-5596
 URL: https://issues.apache.org/jira/browse/HBASE-5596
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.1, 0.94.0
Reporter: David S. Wang
Assignee: David S. Wang
Priority: Minor
 Attachments: HBASE-5596.patch, hbase-5596-0.94.patch


 A few leftover bugs from HBASE-5209.  Comments are documented here:
 https://reviews.apache.org/r/3892/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5596) Few minor bugs from HBASE-5209


 [ 
https://issues.apache.org/jira/browse/HBASE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5596:
--

Attachment: hbase-5596-0.94.patch

one-line change to apply to 0.94/0.92

 Few minor bugs from HBASE-5209
 --

 Key: HBASE-5596
 URL: https://issues.apache.org/jira/browse/HBASE-5596
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.1, 0.94.0
Reporter: David S. Wang
Assignee: David S. Wang
Priority: Minor
 Attachments: HBASE-5596.patch, hbase-5596-0.94.patch


 A few leftover bugs from HBASE-5209.  Comments are documented here:
 https://reviews.apache.org/r/3892/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5637) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.


[ 
https://issues.apache.org/jira/browse/HBASE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238589#comment-13238589
 ] 

Hadoop QA commented on HBASE-5637:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519976/hbase-5637.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1306//console

This message is automatically generated.

 Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.
 

 Key: HBASE-5637
 URL: https://issues.apache.org/jira/browse/HBASE-5637
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.7
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: hbase-5637.patch


 After committing HBASE-5128 and HBASE-5563 to the 0.90 branch, Ted noticed 
 that TestHMsg#getList started to fail consistently.  This updates the test to 
 deal with the updated equality semantics in HBASE-5563.  This fixes that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-5190) Limit the IPC queue size based on calls' payload size

2012-03-26 Thread Jean-Daniel Cryans (Resolved) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jean-Daniel Cryans resolved HBASE-5190.
---

Resolution: Fixed

Committed the addendum, thanks for looking at this Ted. Also, sorry I forgot
about security.

Limit the IPC queue size based on calls' payload size
-

Key: HBASE-5190
URL: https://issues.apache.org/jira/browse/HBASE-5190
Project: HBase
Issue Type: Improvement
Affects Versions: 0.90.5
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Fix For: 0.94.0, 0.96.0

Attachments: 5190.addendum, HBASE-5190-v2.patch, HBASE-5190-v3.patch,
HBASE-5190.patch

Currently we limit the number of calls in the IPC queue only on their count.
It used to be really high and was dropped down recently to num_handlers * 10
(so 100 by default) because it was easy to OOME yourself when huge calls were
being queued. It's still possible to hit this problem if you use really big
values and/or a lot of handlers, so the idea is that we should take into
account the payload size. I can see 3 solutions:
- Do the accounting outside of the queue itself for all calls coming in and
out and when a call doesn't fit, throw a retryable exception.
- Same accounting but instead block the call when it comes in until space is
made available.
- Add a new parameter for the maximum size (in bytes) of a Call and then set
the size the IPC queue (in terms of the number of items) so that it could
only contain as many items as some predefined maximum size (in bytes) for the
whole queue.

[jira] [Updated] (HBASE-5637) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.


 [ 
https://issues.apache.org/jira/browse/HBASE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5637:
--

Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519976/hbase-5637.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1306//console

This message is automatically generated.)

 Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.
 

 Key: HBASE-5637
 URL: https://issues.apache.org/jira/browse/HBASE-5637
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.7
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: hbase-5637.patch


 After committing HBASE-5128 and HBASE-5563 to the 0.90 branch, Ted noticed 
 that TestHMsg#getList started to fail consistently.  This updates the test to 
 deal with the updated equality semantics in HBASE-5563.  This fixes that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5637) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.


 [ 
https://issues.apache.org/jira/browse/HBASE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5637:
--

   Resolution: Fixed
Fix Version/s: 0.90.7
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.
 

 Key: HBASE-5637
 URL: https://issues.apache.org/jira/browse/HBASE-5637
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.7
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.90.7

 Attachments: hbase-5637.patch


 After committing HBASE-5128 and HBASE-5563 to the 0.90 branch, Ted noticed 
 that TestHMsg#getList started to fail consistently.  This updates the test to 
 deal with the updated equality semantics in HBASE-5563.  This fixes that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5637) Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.

2012-03-26 Thread Jonathan Hsieh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238609#comment-13238609
 ] 

Jonathan Hsieh commented on HBASE-5637:
---

Committed.  Dave and Stack, thanks for review.

 Fix failing 0.90 TestHMsg testcase introduced by HBASE-5563.
 

 Key: HBASE-5637
 URL: https://issues.apache.org/jira/browse/HBASE-5637
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.7
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.90.7

 Attachments: hbase-5637.patch


 After committing HBASE-5128 and HBASE-5563 to the 0.90 branch, Ted noticed 
 that TestHMsg#getList started to fail consistently.  This updates the test to 
 deal with the updated equality semantics in HBASE-5563.  This fixes that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5596) Few minor bugs from HBASE-5209

[
https://issues.apache.org/jira/browse/HBASE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238621#comment-13238621
]

Hadoop QA commented on HBASE-5596:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12519979/hbase-5596-0.94.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

-1 patch. The patch command could not apply the patch.

Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1308//console

This message is automatically generated.

Few minor bugs from HBASE-5209
--

Key: HBASE-5596
URL: https://issues.apache.org/jira/browse/HBASE-5596
Project: HBase
Issue Type: Bug
Components: master
Affects Versions: 0.92.1, 0.94.0
Reporter: David S. Wang
Assignee: David S. Wang
Priority: Minor
Attachments: HBASE-5596.patch, hbase-5596-0.94.patch

A few leftover bugs from HBASE-5209. Comments are documented here:
https://reviews.apache.org/r/3892/

[jira] [Created] (HBASE-5638) Backport HBASE-5633 to 0.90 and 0.92 - NPE reading ZK config in HBase

2012-03-26 Thread Matteo Bertozzi (Created) (JIRA)

Backport HBASE-5633 to 0.90 and 0.92 - NPE reading ZK config in HBase
-

 Key: HBASE-5638
 URL: https://issues.apache.org/jira/browse/HBASE-5638
 Project: HBase
  Issue Type: Sub-task
  Components: zookeeper
Affects Versions: 0.92.1, 0.90.6
Reporter: Matteo Bertozzi
Priority: Minor
 Fix For: 0.90.7, 0.92.2




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5638) Backport to 0.90 and 0.92 - NPE reading ZK config in HBase

2012-03-26 Thread Matteo Bertozzi (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-5638:
---

Summary: Backport to 0.90 and 0.92 - NPE reading ZK config in HBase  (was: 
Backport HBASE-5633 to 0.90 and 0.92 - NPE reading ZK config in HBase)

 Backport to 0.90 and 0.92 - NPE reading ZK config in HBase
 --

 Key: HBASE-5638
 URL: https://issues.apache.org/jira/browse/HBASE-5638
 Project: HBase
  Issue Type: Sub-task
  Components: zookeeper
Affects Versions: 0.90.6, 0.92.1
Reporter: Matteo Bertozzi
Priority: Minor
 Fix For: 0.90.7, 0.92.2




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5638) Backport to 0.90 and 0.92 - NPE reading ZK config in HBase

2012-03-26 Thread Matteo Bertozzi (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-5638:
---

Status: Patch Available  (was: Open)

 Backport to 0.90 and 0.92 - NPE reading ZK config in HBase
 --

 Key: HBASE-5638
 URL: https://issues.apache.org/jira/browse/HBASE-5638
 Project: HBase
  Issue Type: Sub-task
  Components: zookeeper
Affects Versions: 0.92.1, 0.90.6
Reporter: Matteo Bertozzi
Priority: Minor
 Fix For: 0.90.7, 0.92.2

 Attachments: HBASE-5633-0.90.patch, HBASE-5633-0.92.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5638) Backport to 0.90 and 0.92 - NPE reading ZK config in HBase

2012-03-26 Thread Matteo Bertozzi (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-5638:
---

Attachment: HBASE-5633-0.92.patch
HBASE-5633-0.90.patch

I've attached the 0.90 and 0.92 patches to backport HBASE-5633 fix

 Backport to 0.90 and 0.92 - NPE reading ZK config in HBase
 --

 Key: HBASE-5638
 URL: https://issues.apache.org/jira/browse/HBASE-5638
 Project: HBase
  Issue Type: Sub-task
  Components: zookeeper
Affects Versions: 0.90.6, 0.92.1
Reporter: Matteo Bertozzi
Priority: Minor
 Fix For: 0.90.7, 0.92.2

 Attachments: HBASE-5633-0.90.patch, HBASE-5633-0.92.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5638) Backport to 0.90 and 0.92 - NPE reading ZK config in HBase


[ 
https://issues.apache.org/jira/browse/HBASE-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238634#comment-13238634
 ] 

Hadoop QA commented on HBASE-5638:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12519985/HBASE-5633-0.92.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1309//console

This message is automatically generated.

 Backport to 0.90 and 0.92 - NPE reading ZK config in HBase
 --

 Key: HBASE-5638
 URL: https://issues.apache.org/jira/browse/HBASE-5638
 Project: HBase
  Issue Type: Sub-task
  Components: zookeeper
Affects Versions: 0.90.6, 0.92.1
Reporter: Matteo Bertozzi
Priority: Minor
 Fix For: 0.90.7, 0.92.2

 Attachments: HBASE-5633-0.90.patch, HBASE-5633-0.92.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4676) Prefix Compression - Trie data block encoding

[
https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238645#comment-13238645
]

stack commented on HBASE-4676:
--

bq. The KeyValueScanner and related classes that iterate through the trie will
eventually need to be smarter and have methods to do things like skipping to
the next row of results without scanning every cell in between.

Do we not have this now Matt?

As said before, 'So 40-80x faster seeks at medium block size.', is pretty nice
improvement. Ditto on 'Seek speed is ~15x NONE seek speed while expanding the
effective block cache size by 4x.'

What are you thinking these times Matt? The slow write and scan speeds would
tend to rule it out for anything but a random read workload but when random
reading only, you'll want to use prefixtrie (smile). Do you think it possible
to approach prefix compression scan and write speeds?

Regards your working set read testing, did it all fit in memory?

How did you measure cycles per seek?

What is numBlocks? The total number of blocks the dataset fit in?

Throughput is better for sure.. what is latency like?

Excellent stuff Matt.

Prefix Compression - Trie data block encoding
-

Key: HBASE-4676
URL: https://issues.apache.org/jira/browse/HBASE-4676
Project: HBase
Issue Type: New Feature
Components: io, performance, regionserver
Affects Versions: 0.90.6
Reporter: Matt Corgan
Attachments: PrefixTrie_Format_v1.pdf, PrefixTrie_Performance_v1.pdf,
SeeksPerSec by blockSize.png

The HBase data block format has room for 2 significant improvements for
applications that have high block cache hit ratios.
First, there is no prefix compression, and the current KeyValue format is
somewhat metadata heavy, so there can be tremendous memory bloat for many
common data layouts, specifically those with long keys and short values.
Second, there is no random access to KeyValues inside data blocks. This
means that every time you double the datablock size, average seek time (or
average cpu consumption) goes up by a factor of 2. The standard 64KB block
size is ~10x slower for random seeks than a 4KB block size, but block sizes
as small as 4KB cause problems elsewhere. Using block sizes of 256KB or 1MB
or more may be more efficient from a disk access and block-cache perspective
in many big-data applications, but doing so is infeasible from a random seek
perspective.
The PrefixTrie block encoding format attempts to solve both of these
problems. Some features:
* trie format for row key encoding completely eliminates duplicate row keys
and encodes similar row keys into a standard trie structure which also saves
a lot of space
* the column family is currently stored once at the beginning of each block.
this could easily be modified to allow multiple family names per block
* all qualifiers in the block are stored in their own trie format which
caters nicely to wide rows. duplicate qualifers between rows are eliminated.
the size of this trie determines the width of the block's qualifier
fixed-width-int
* the minimum timestamp is stored at the beginning of the block, and deltas
are calculated from that. the maximum delta determines the width of the
block's timestamp fixed-width-int
The block is structured with metadata at the beginning, then a section for
the row trie, then the column trie, then the timestamp deltas, and then then
all the values. Most work is done in the row trie, where every leaf node
(corresponding to a row) contains a list of offsets/references corresponding
to the cells in that row. Each cell is fixed-width to enable binary
searching and is represented by [1 byte operationType, X bytes qualifier
offset, X bytes timestamp delta offset].
If all operation types are the same for a block, there will be zero per-cell
overhead. Same for timestamps. Same for qualifiers when i get a chance.
So, the compression aspect is very strong, but makes a few small sacrifices
on VarInt size to enable faster binary searches in trie fan-out nodes.
A more compressed but slower version might build on this by also applying
further (suffix, etc) compression on the trie nodes at the cost of slower
write speed. Even further compression could be obtained by using all VInts
instead of FInts with a sacrifice on random seek speed (though not huge).
One current drawback is the current write speed. While programmed with good
constructs like TreeMaps, ByteBuffers, binary searches, etc, it's not
programmed with the same level of optimization as the read path. Work will
need to be done to optimize the data structures used for encoding and could
probably show a 10x increase. It will still be slower than delta encoding,

[jira] [Commented] (HBASE-5533) Add more metrics to HBase

[
https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238647#comment-13238647
]

Hadoop QA commented on HBASE-5533:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12519846/HBASE-5533-TRUNK-v6.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
org.apache.hadoop.hbase.client.TestFromClientSide

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1307//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1307//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1307//console

This message is automatically generated.

Add more metrics to HBase
-

Key: HBASE-5533
URL: https://issues.apache.org/jira/browse/HBASE-5533
Project: HBase
Issue Type: Improvement
Affects Versions: 0.92.2, 0.94.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Minor
Attachments: BlockingQueueContention.java, HBASE-5533-0.92-v4.patch,
HBASE-5533-TRUNK-v6.patch, HBASE-5533-TRUNK-v6.patch, TimingOverhead.java,
hbase-5533-0.92.patch, hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch,
hbase5533-0.92-v5.patch, histogram_web_ui.png

To debug/monitor production clusters, there are some more metrics I wish I
had available.
In particular:
- Although the average FS latencies are useful, a 'histogram' of recent
latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc)
would be more useful
- Similar histograms of latencies on common operations (GET, PUT, DELETE)
would be useful
- Counting the number of accesses to each region to detect hotspotting
- Exposing the current number of HLog files

[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush


[ 
https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238664#comment-13238664
 ] 

Lars Hofhansl commented on HBASE-5623:
--

I do not want to wait with the RC any longer.
@Stack or @Ted: Please have a look at the last patch (will add LOG.debug 
message as indicated in the last comment).

 Race condition when rolling the HLog and hlogFlush
 --

 Key: HBASE-5623
 URL: https://issues.apache.org/jira/browse/HBASE-5623
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 
 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, 
 HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch


 When doing a ycsb test with a large number of handlers 
 (regionserver.handler.count=60), I get the following exceptions:
 {code}
 Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152)
   at $Proxy1.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214)
 {code}
 and 
 {code}
   java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
 {code}
 It seems the root cause of the issue is that we open a new log writer and 
 close the old one at HLog#rollWriter() holding the updateLock, but the other 
 threads doing syncer() calls
 {code} 
 logSyncerThread.hlogFlush(this.writer);
 {code}
 without holding the updateLock. LogSyncer only synchronizes against 
 concurrent appends and flush(), but not on the passed writer, which can be 
 closed already by rollWriter(). In this case, since 
 SequenceFile#Writer.close() sets it's out field as null, we get the NPE. 

--
This message is automatically generated by

[jira] [Assigned] (HBASE-5598) Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file

2012-03-26 Thread Uma Maheswara Rao G (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G reassigned HBASE-5598:
--

Assignee: Uma Maheswara Rao G

 Analyse and fix the findbugs reporting by QA and add invalid bugs into 
 findbugs-excludeFilter file
 --

 Key: HBASE-5598
 URL: https://issues.apache.org/jira/browse/HBASE-5598
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor

 There are many findbugs errors reporting by HbaseQA. HBASE-5597 is going to 
 up the OK count.
 This may lead to other issues when we re-factor the code, if we induce new 
 valid ones and remove invalid bugs also can not be reported by QA.
 So, I would propose to add the exclude filter file for findbugs(for the 
 invalid bugs). If we find any valid ones, we can fix under this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush

2012-03-26 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238675#comment-13238675
 ] 

Ted Yu commented on HBASE-5623:
---

Last patch should be good to go.

 Race condition when rolling the HLog and hlogFlush
 --

 Key: HBASE-5623
 URL: https://issues.apache.org/jira/browse/HBASE-5623
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 
 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, 
 HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch


 When doing a ycsb test with a large number of handlers 
 (regionserver.handler.count=60), I get the following exceptions:
 {code}
 Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152)
   at $Proxy1.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214)
 {code}
 and 
 {code}
   java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
 {code}
 It seems the root cause of the issue is that we open a new log writer and 
 close the old one at HLog#rollWriter() holding the updateLock, but the other 
 threads doing syncer() calls
 {code} 
 logSyncerThread.hlogFlush(this.writer);
 {code}
 without holding the updateLock. LogSyncer only synchronizes against 
 concurrent appends and flush(), but not on the passed writer, which can be 
 closed already by rollWriter(). In this case, since 
 SequenceFile#Writer.close() sets it's out field as null, we get the NPE. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:

[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush


[ 
https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238680#comment-13238680
 ] 

stack commented on HBASE-5623:
--

Is this right?

{code}
+Writer tempWriter;
 synchronized (this.updateLock) {
   if (this.closed) return;
+  tempWriter = this.writer; // guaranteed non-null
 }
{code}

When we go to use tempWriter, it may be closed.  We're fine w/ that?  We'll get 
an IOE about it being closed

Yeah, add a log here instead:

{code}
+} catch (IOException x) {
+  // Ignore.
+  // Writer might have been closed.
+  // In any case, we either don't have to do anything,
+  // or the log will be rolled the next time.
 }
{code}

I'm +1 on commit.  Its way to easy reproducing the NPE.  The changes in 
SequenceFileLogWriter are an improvement.  I'm down w/ the retry on IOE (long 
as the retry is logged).

Good stuff.

 Race condition when rolling the HLog and hlogFlush
 --

 Key: HBASE-5623
 URL: https://issues.apache.org/jira/browse/HBASE-5623
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 
 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, 
 HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch


 When doing a ycsb test with a large number of handlers 
 (regionserver.handler.count=60), I get the following exceptions:
 {code}
 Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152)
   at $Proxy1.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214)
 {code}
 and 
 {code}
   java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at

[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush


[ 
https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238689#comment-13238689
 ] 

Lars Hofhansl commented on HBASE-5623:
--

Re: tempWriter. Yep, the idea is the *only* error-condition for tempWriter is 
that it could have been closed.
I'll add LOG.debug(Log roll failed and will be retried. (This is not an 
error))


 Race condition when rolling the HLog and hlogFlush
 --

 Key: HBASE-5623
 URL: https://issues.apache.org/jira/browse/HBASE-5623
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 
 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, 
 HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch


 When doing a ycsb test with a large number of handlers 
 (regionserver.handler.count=60), I get the following exceptions:
 {code}
 Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152)
   at $Proxy1.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214)
 {code}
 and 
 {code}
   java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
 {code}
 It seems the root cause of the issue is that we open a new log writer and 
 close the old one at HLog#rollWriter() holding the updateLock, but the other 
 threads doing syncer() calls
 {code} 
 logSyncerThread.hlogFlush(this.writer);
 {code}
 without holding the updateLock. LogSyncer only synchronizes against 
 concurrent appends and flush(), but not on the passed writer, which can be 
 closed already by rollWriter(). In this case, since 
 SequenceFile#Writer.close() sets it's out field as null, we get the NPE. 

--
This message

[jira] [Commented] (HBASE-5190) Limit the IPC queue size based on calls' payload size

2012-03-26 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238697#comment-13238697
 ] 

Hudson commented on HBASE-5190:
---

Integrated in HBase-0.94 #56 (See 
[https://builds.apache.org/job/HBase-0.94/56/])
HBASE-5190 Limit the IPC queue size based on calls' payload size
   (Ted's addendum) (Revision 1305469)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/branches/0.94/security/src/main/java/org/apache/hadoop/hbase/ipc/SecureServer.java


 Limit the IPC queue size based on calls' payload size
 -

 Key: HBASE-5190
 URL: https://issues.apache.org/jira/browse/HBASE-5190
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.5
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.94.0, 0.96.0

 Attachments: 5190.addendum, HBASE-5190-v2.patch, HBASE-5190-v3.patch, 
 HBASE-5190.patch


 Currently we limit the number of calls in the IPC queue only on their count. 
 It used to be really high and was dropped down recently to num_handlers * 10 
 (so 100 by default) because it was easy to OOME yourself when huge calls were 
 being queued. It's still possible to hit this problem if you use really big 
 values and/or a lot of handlers, so the idea is that we should take into 
 account the payload size. I can see 3 solutions:
  - Do the accounting outside of the queue itself for all calls coming in and 
 out and when a call doesn't fit, throw a retryable exception.
  - Same accounting but instead block the call when it comes in until space is 
 made available.
  - Add a new parameter for the maximum size (in bytes) of a Call and then set 
 the size the IPC queue (in terms of the number of items) so that it could 
 only contain as many items as some predefined maximum size (in bytes) for the 
 whole queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5619) Create PB protocols for HRegionInterface


[ 
https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238699#comment-13238699
 ] 

jirapos...@reviews.apache.org commented on HBASE-5619:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4054/#review6354
---


I'm looking at differences between r1 and r4 and I don't see anything new in 
HRegionProtocol.proto ..?


pom.xml
https://reviews.apache.org/r/4054/#comment13781

Can you please kill all the trailing whitespaces while you're at it?



pom.xml
https://reviews.apache.org/r/4054/#comment13779

Just do if which cygpath 2/dev/null instead of doing it on 2 lines and 
testing $?



pom.xml
https://reviews.apache.org/r/4054/#comment13782

Just do for PROTO_FILE in $UNIX_PROTO_DIR/*.proto, I don't think the ls 
is really necessary here.



src/main/proto/HRegionProtocol.proto
https://reviews.apache.org/r/4054/#comment13783

I'm not seeing that you added it.


- Benoit


On 2012-03-23 19:29:52, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4054/
bq.  ---
bq.  
bq.  (Updated 2012-03-23 19:29:52)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This is the first draft of the ProtoBuff HRegionProtocol.  The 
corresponding java vs pb method mapping is attached to the jira: 
https://issues.apache.org/jira/browse/HBASE-5443
bq.  
bq.  Please review.  I'd like to move ahead after we get to some agreement.
bq.  
bq.  
bq.  This addresses bug HBASE-5619.
bq.  https://issues.apache.org/jira/browse/HBASE-5619
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.pom.xml 10b13ef 
bq.src/main/proto/RegionAdmin.proto PRE-CREATION 
bq.src/main/proto/RegionClient.proto PRE-CREATION 
bq.src/main/proto/hbase.proto PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4054/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 Create PB protocols for HRegionInterface
 

 Key: HBASE-5619
 URL: https://issues.apache.org/jira/browse/HBASE-5619
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5619.patch


 Subtask of HBase-5443, separate HRegionInterface into admin protocol and 
 client protocol, create the PB protocol buffer files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5619) Create PB protocols for HRegionInterface

2012-03-26 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238704#comment-13238704
 ] 

jirapos...@reviews.apache.org commented on HBASE-5619:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4054/#review6356
---


Oh my, it's just that review board sucks, it's still showing the file as being 
added when comparing r1 and r4, even though the file is no longer in r4.

- Benoit


On 2012-03-23 19:29:52, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4054/
bq.  ---
bq.  
bq.  (Updated 2012-03-23 19:29:52)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This is the first draft of the ProtoBuff HRegionProtocol.  The 
corresponding java vs pb method mapping is attached to the jira: 
https://issues.apache.org/jira/browse/HBASE-5443
bq.  
bq.  Please review.  I'd like to move ahead after we get to some agreement.
bq.  
bq.  
bq.  This addresses bug HBASE-5619.
bq.  https://issues.apache.org/jira/browse/HBASE-5619
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.pom.xml 10b13ef 
bq.src/main/proto/RegionAdmin.proto PRE-CREATION 
bq.src/main/proto/RegionClient.proto PRE-CREATION 
bq.src/main/proto/hbase.proto PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4054/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 Create PB protocols for HRegionInterface
 

 Key: HBASE-5619
 URL: https://issues.apache.org/jira/browse/HBASE-5619
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5619.patch


 Subtask of HBase-5443, separate HRegionInterface into admin protocol and 
 client protocol, create the PB protocol buffer files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5533) Add more metrics to HBase

2012-03-26 Thread stack (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5533:
-

   Resolution: Fixed
Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to trunk.  The two tests that failed seemed to have nought to do w/ 
this patch (I tried them locally and they both passed).  Thanks for the nice 
addition Shaneal.

 Add more metrics to HBase
 -

 Key: HBASE-5533
 URL: https://issues.apache.org/jira/browse/HBASE-5533
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.2, 0.94.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Minor
 Fix For: 0.96.0

 Attachments: BlockingQueueContention.java, HBASE-5533-0.92-v4.patch, 
 HBASE-5533-TRUNK-v6.patch, HBASE-5533-TRUNK-v6.patch, TimingOverhead.java, 
 hbase-5533-0.92.patch, hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch, 
 hbase5533-0.92-v5.patch, histogram_web_ui.png


 To debug/monitor production clusters, there are some more metrics I wish I 
 had available.
 In particular:
 - Although the average FS latencies are useful, a 'histogram' of recent 
 latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) 
 would be more useful
 - Similar histograms of latencies on common operations (GET, PUT, DELETE) 
 would be useful
 - Counting the number of accesses to each region to detect hotspotting
 - Exposing the current number of HLog files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5628) Improve performance of uberhbck

2012-03-26 Thread Jonathan Hsieh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238721#comment-13238721
 ] 

Jonathan Hsieh commented on HBASE-5628:
---

Minor optimization: 

in HBaseFsck#loadHdfsRegionInfos we could consolidate.

{code}
if (modTInfo = null) {
 ...
}
..
tablesInfo.put(tableNAme, modTInfo);
{code}

to be 


{code}
if (modTInfo = null) {
  ...
  tablesInfo.put(tableNAme, modTInfo);
}
..
{code}

 Improve performance of uberhbck
 ---

 Key: HBASE-5628
 URL: https://issues.apache.org/jira/browse/HBASE-5628
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Affects Versions: 0.90.7, 0.92.2, 0.94.0, 0.96.0
Reporter: Jonathan Hsieh

 During reviews of HBASE-5128 there are several opportunities investigate for 
 improving the performance of the tool.
 - Change regionInfoMap and tablesInfo from TreeMap to HashMap.
 - Change some full region set reloads to be incremental to require fewer 
 passes.
 - Cache meta for subsequent calls of closeRegionSileneglyAndWait

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5619) Create PB protocols for HRegionInterface

2012-03-26 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238766#comment-13238766
 ] 

jirapos...@reviews.apache.org commented on HBASE-5619:
--



bq.  On 2012-03-26 19:15:13, Benoit Sigoure wrote:
bq.   pom.xml, line 780
bq.   https://reviews.apache.org/r/4054/diff/1-4/?file=86002#file86002line780
bq.  
bq.   Just do if which cygpath 2/dev/null instead of doing it on 2 
lines and testing $?

Sure, will fix.


bq.  On 2012-03-26 19:15:13, Benoit Sigoure wrote:
bq.   pom.xml, line 787
bq.   https://reviews.apache.org/r/4054/diff/1-4/?file=86002#file86002line787
bq.  
bq.   Just do for PROTO_FILE in $UNIX_PROTO_DIR/*.proto, I don't think 
the ls is really necessary here.

You are right.  It works without ls.  Fixed.


bq.  On 2012-03-26 19:15:13, Benoit Sigoure wrote:
bq.   pom.xml, line 334
bq.   https://reviews.apache.org/r/4054/diff/1-4/?file=86002#file86002line334
bq.  
bq.   Can you please kill all the trailing whitespaces while you're at it?

Sure, I will do that.


- Jimmy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4054/#review6354
---


On 2012-03-23 19:29:52, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4054/
bq.  ---
bq.  
bq.  (Updated 2012-03-23 19:29:52)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This is the first draft of the ProtoBuff HRegionProtocol.  The 
corresponding java vs pb method mapping is attached to the jira: 
https://issues.apache.org/jira/browse/HBASE-5443
bq.  
bq.  Please review.  I'd like to move ahead after we get to some agreement.
bq.  
bq.  
bq.  This addresses bug HBASE-5619.
bq.  https://issues.apache.org/jira/browse/HBASE-5619
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.pom.xml 10b13ef 
bq.src/main/proto/RegionAdmin.proto PRE-CREATION 
bq.src/main/proto/RegionClient.proto PRE-CREATION 
bq.src/main/proto/hbase.proto PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4054/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 Create PB protocols for HRegionInterface
 

 Key: HBASE-5619
 URL: https://issues.apache.org/jira/browse/HBASE-5619
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5619.patch


 Subtask of HBase-5443, separate HRegionInterface into admin protocol and 
 client protocol, create the PB protocol buffer files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5619) Create PB protocols for HRegionInterface

2012-03-26 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238768#comment-13238768
 ] 

jirapos...@reviews.apache.org commented on HBASE-5619:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4054/
---

(Updated 2012-03-26 20:14:22.677676)


Review request for hbase.


Changes
---

Addressed Benoit's comments.


Summary
---

This is the first draft of the ProtoBuff HRegionProtocol.  The corresponding 
java vs pb method mapping is attached to the jira: 
https://issues.apache.org/jira/browse/HBASE-5443

Please review.  I'd like to move ahead after we get to some agreement.


This addresses bug HBASE-5619.
https://issues.apache.org/jira/browse/HBASE-5619


Diffs (updated)
-

  pom.xml 10b13ef 
  src/main/proto/RegionAdmin.proto PRE-CREATION 
  src/main/proto/RegionClient.proto PRE-CREATION 
  src/main/proto/hbase.proto PRE-CREATION 

Diff: https://reviews.apache.org/r/4054/diff


Testing
---


Thanks,

Jimmy



 Create PB protocols for HRegionInterface
 

 Key: HBASE-5619
 URL: https://issues.apache.org/jira/browse/HBASE-5619
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5619.patch


 Subtask of HBase-5443, separate HRegionInterface into admin protocol and 
 client protocol, create the PB protocol buffer files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5544) Add metrics to HRegion.processRow()

2012-03-26 Thread Ted Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5544:
--

Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

 Add metrics to HRegion.processRow()
 ---

 Key: HBASE-5544
 URL: https://issues.apache.org/jira/browse/HBASE-5544
 Project: HBase
  Issue Type: New Feature
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: HBASE-5544.D2457.1.patch, HBASE-5544.D2457.2.patch


 Add metrics of
 1. time for waiting for the lock
 2. processing time (scan time)
 3. time spent while holding the lock
 4. total call time
 5. number of failures / calls

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5544) Add metrics to HRegion.processRow()

2012-03-26 Thread Ted Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5544:
--

Fix Version/s: 0.96.0

 Add metrics to HRegion.processRow()
 ---

 Key: HBASE-5544
 URL: https://issues.apache.org/jira/browse/HBASE-5544
 Project: HBase
  Issue Type: New Feature
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.96.0

 Attachments: HBASE-5544.D2457.1.patch, HBASE-5544.D2457.2.patch


 Add metrics of
 1. time for waiting for the lock
 2. processing time (scan time)
 3. time spent while holding the lock
 4. total call time
 5. number of failures / calls

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost

2012-03-26 Thread Prakash Khemani (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-5606:
---

Attachment: 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch

Do not do any error processing if the getDataSetWatch() call from 
SplitLogManager timeoutMonitor fails

 SplitLogManger async delete node hangs log splitting when ZK connection is 
 lost 
 

 Key: HBASE-5606
 URL: https://issues.apache.org/jira/browse/HBASE-5606
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Gopinathan A
Priority: Critical
 Fix For: 0.92.2

 Attachments: 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 5606.txt


 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All tasks are failed due to ZK connection lost, so the all the tasks were 
 deleted asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. The asynchronously deletion in step 2 finally happened for new task
 5. This made the SplitLogManger in hanging state.
 This leads to .META. region not assigened for long time
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 
 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 
 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 
 19:34:31,196 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 
 19:34:32,497 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost

2012-03-26 Thread Prakash Khemani (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238794#comment-13238794
 ] 

Prakash Khemani commented on HBASE-5606:


@Jimmy This is similar to HBASE-5081 w.r.t what goes wrong - a pending delete 
creates havoc on the next create. But it is different from HBASE-5081 because 
the pending Delete is created at a different point in the code - in the 
timeoutMonitor and not when the task actually fails ...

 SplitLogManger async delete node hangs log splitting when ZK connection is 
 lost 
 

 Key: HBASE-5606
 URL: https://issues.apache.org/jira/browse/HBASE-5606
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Gopinathan A
Priority: Critical
 Fix For: 0.92.2

 Attachments: 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 5606.txt


 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All tasks are failed due to ZK connection lost, so the all the tasks were 
 deleted asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. The asynchronously deletion in step 2 finally happened for new task
 5. This made the SplitLogManger in hanging state.
 This leads to .META. region not assigened for long time
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 
 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 
 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 
 19:34:31,196 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 
 19:34:32,497 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5623) Race condition when rolling the HLog and hlogFlush


 [ 
https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5623:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to 0.94 and 0.96.

 Race condition when rolling the HLog and hlogFlush
 --

 Key: HBASE-5623
 URL: https://issues.apache.org/jira/browse/HBASE-5623
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 
 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, 
 HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch


 When doing a ycsb test with a large number of handlers 
 (regionserver.handler.count=60), I get the following exceptions:
 {code}
 Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152)
   at $Proxy1.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214)
 {code}
 and 
 {code}
   java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
 {code}
 It seems the root cause of the issue is that we open a new log writer and 
 close the old one at HLog#rollWriter() holding the updateLock, but the other 
 threads doing syncer() calls
 {code} 
 logSyncerThread.hlogFlush(this.writer);
 {code}
 without holding the updateLock. LogSyncer only synchronizes against 
 concurrent appends and flush(), but not on the passed writer, which can be 
 closed already by rollWriter(). In this case, since 
 SequenceFile#Writer.close() sets it's out field as null, we get the NPE. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:

[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush

2012-03-26 Thread Uma Maheswara Rao G (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238800#comment-13238800
 ] 

Lars Hofhansl commented on HBASE-5623:
--

Thanks for the patch Enis! And thanks for the reviews Stack and Ted.

 Race condition when rolling the HLog and hlogFlush
 --

 Key: HBASE-5623
 URL: https://issues.apache.org/jira/browse/HBASE-5623
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 
 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, 
 HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch


 When doing a ycsb test with a large number of handlers 
 (regionserver.handler.count=60), I get the following exceptions:
 {code}
 Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152)
   at $Proxy1.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214)
 {code}
 and 
 {code}
   java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
 {code}
 It seems the root cause of the issue is that we open a new log writer and 
 close the old one at HLog#rollWriter() holding the updateLock, but the other 
 threads doing syncer() calls
 {code} 
 logSyncerThread.hlogFlush(this.writer);
 {code}
 without holding the updateLock. LogSyncer only synchronizes against 
 concurrent appends and flush(), but not on the passed writer, which can be 
 closed already by rollWriter(). In this case, since 
 SequenceFile#Writer.close() sets it's out field as null, we get the NPE. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:

[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush

2012-03-26 Thread Jean-Daniel Cryans (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238802#comment-13238802
 ] 

Jean-Daniel Cryans commented on HBASE-5623:
---

Funny, I just saw that NPE for the first time in my testing.

 Race condition when rolling the HLog and hlogFlush
 --

 Key: HBASE-5623
 URL: https://issues.apache.org/jira/browse/HBASE-5623
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 
 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, 
 HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch


 When doing a ycsb test with a large number of handlers 
 (regionserver.handler.count=60), I get the following exceptions:
 {code}
 Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152)
   at $Proxy1.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214)
 {code}
 and 
 {code}
   java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
 {code}
 It seems the root cause of the issue is that we open a new log writer and 
 close the old one at HLog#rollWriter() holding the updateLock, but the other 
 threads doing syncer() calls
 {code} 
 logSyncerThread.hlogFlush(this.writer);
 {code}
 without holding the updateLock. LogSyncer only synchronizes against 
 concurrent appends and flush(), but not on the passed writer, which can be 
 closed already by rollWriter(). In this case, since 
 SequenceFile#Writer.close() sets it's out field as null, we get the NPE. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:

[jira] [Commented] (HBASE-5598) Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file

[
https://issues.apache.org/jira/browse/HBASE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238804#comment-13238804
]

Uma Maheswara Rao G commented on HBASE-5598:

I am not so favor of adding directly static tools related annotations into
code.
In Hadoop projects(HDFS, Mapreduce) we are using this exclude-filter for
invalid find bugs. So, I proposed this here.

I think we can decide first how we will exclude the invalid bugs, then we can
start working on the bug fix directly.

Analyse and fix the findbugs reporting by QA and add invalid bugs into
findbugs-excludeFilter file
--

Key: HBASE-5598
URL: https://issues.apache.org/jira/browse/HBASE-5598
Project: HBase
Issue Type: Bug
Components: scripts
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor

There are many findbugs errors reporting by HbaseQA. HBASE-5597 is going to
up the OK count.
This may lead to other issues when we re-factor the code, if we induce new
valid ones and remove invalid bugs also can not be reported by QA.
So, I would propose to add the exclude filter file for findbugs(for the
invalid bugs). If we find any valid ones, we can fix under this JIRA.

[jira] [Commented] (HBASE-3134) [replication] Add the ability to enable/disable streams

2012-03-26 Thread Jean-Daniel Cryans (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238807#comment-13238807
 ] 

Jean-Daniel Cryans commented on HBASE-3134:
---

+1 on the patch you put up on review board Teruyoshi, can you attach it here 
and grant the license so that I can commit? Thanks!

 [replication] Add the ability to enable/disable streams
 ---

 Key: HBASE-3134
 URL: https://issues.apache.org/jira/browse/HBASE-3134
 Project: HBase
  Issue Type: New Feature
  Components: replication
Reporter: Jean-Daniel Cryans
Assignee: Teruyoshi Zenmyo
Priority: Minor
  Labels: replication
 Fix For: 0.94.1

 Attachments: 3134-v2.txt, 3134-v3.txt, 3134.txt, HBASE-3134.patch, 
 HBASE-3134.patch, HBASE-3134.patch, HBASE-3134.patch


 This jira was initially in the scope of HBASE-2201, but was pushed out since 
 it has low value compared to the required effort (and when want to ship 
 0.90.0 rather soonish).
 We need to design a way to enable/disable replication streams in a 
 determinate fashion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5533) Add more metrics to HBase


 [ 
https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5533:
-

Fix Version/s: 0.94.0

Patch applies almost cleanup to 0.94. Will pull it in.

 Add more metrics to HBase
 -

 Key: HBASE-5533
 URL: https://issues.apache.org/jira/browse/HBASE-5533
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.2, 0.94.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Minor
 Fix For: 0.94.0, 0.96.0

 Attachments: BlockingQueueContention.java, HBASE-5533-0.92-v4.patch, 
 HBASE-5533-TRUNK-v6.patch, HBASE-5533-TRUNK-v6.patch, TimingOverhead.java, 
 hbase-5533-0.92.patch, hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch, 
 hbase5533-0.92-v5.patch, histogram_web_ui.png


 To debug/monitor production clusters, there are some more metrics I wish I 
 had available.
 In particular:
 - Although the average FS latencies are useful, a 'histogram' of recent 
 latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) 
 would be more useful
 - Similar histograms of latencies on common operations (GET, PUT, DELETE) 
 would be useful
 - Counting the number of accesses to each region to detect hotspotting
 - Exposing the current number of HLog files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost

2012-03-26 Thread Jimmy Xiang (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238811#comment-13238811
 ] 

Jimmy Xiang commented on HBASE-5606:


@Prakash,  could there be other places which failed delete can cause this issue?

Is it a cleaner fix to change async delete to sync delete?  With sync delete, 
we can
avoid all these havoc racing problems, and the retry will get a fresh start 
each time.

 SplitLogManger async delete node hangs log splitting when ZK connection is 
 lost 
 

 Key: HBASE-5606
 URL: https://issues.apache.org/jira/browse/HBASE-5606
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Gopinathan A
Priority: Critical
 Fix For: 0.92.2

 Attachments: 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 5606.txt


 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All tasks are failed due to ZK connection lost, so the all the tasks were 
 deleted asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. The asynchronously deletion in step 2 finally happened for new task
 5. This made the SplitLogManger in hanging state.
 This leads to .META. region not assigened for long time
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 
 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 
 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 
 19:34:31,196 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 
 19:34:32,497 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5533) Add more metrics to HBase

2012-03-26 Thread Uma Maheswara Rao G (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238817#comment-13238817
 ] 

Lars Hofhansl commented on HBASE-5533:
--

Committed to 0.94 as well.

 Add more metrics to HBase
 -

 Key: HBASE-5533
 URL: https://issues.apache.org/jira/browse/HBASE-5533
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.2, 0.94.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Minor
 Fix For: 0.94.0, 0.96.0

 Attachments: BlockingQueueContention.java, HBASE-5533-0.92-v4.patch, 
 HBASE-5533-TRUNK-v6.patch, HBASE-5533-TRUNK-v6.patch, TimingOverhead.java, 
 hbase-5533-0.92.patch, hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch, 
 hbase5533-0.92-v5.patch, histogram_web_ui.png


 To debug/monitor production clusters, there are some more metrics I wish I 
 had available.
 In particular:
 - Although the average FS latencies are useful, a 'histogram' of recent 
 latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) 
 would be more useful
 - Similar histograms of latencies on common operations (GET, PUT, DELETE) 
 would be useful
 - Counting the number of accesses to each region to detect hotspotting
 - Exposing the current number of HLog files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5598) Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file

[
https://issues.apache.org/jira/browse/HBASE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238820#comment-13238820
]

Uma Maheswara Rao G commented on HBASE-5598:

Who ever wants to generate the findbugs html report locally, follow the below
steps.

1) add the below target in pom.xml. Already there is one transformationSet
available in pom.xml. just we can place this after that transformationSet.

transformationSet
dir${basedir}/target//dir
includes
includefindbugsXml.xml/include
/includes

stylesheetE:/SoftWares/findbugs-1.3.9/findbugs-1.3.9/src/xsl/default.xsl/stylesheet
outputDir${basedir}/target//outputDir
/transformationSet

2) Make sure to update the above findbugs xsl path correctly referring to your
local path of findbugs.

3) run 'mvn findbugs:findbugs'

4) run 'mvn site'

now ${basedir}/target/findbugsXml.xml will be replaced with html report. rename
to ${basedir}/target/findbugsXml.html and open.

Analyse and fix the findbugs reporting by QA and add invalid bugs into
findbugs-excludeFilter file
--

[jira] [Updated] (HBASE-5619) Create PB protocols for HRegionInterface

2012-03-26 Thread Jimmy Xiang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5619:
---

Status: Open  (was: Patch Available)

 Create PB protocols for HRegionInterface
 

 Key: HBASE-5619
 URL: https://issues.apache.org/jira/browse/HBASE-5619
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5619.patch, hbase-5619_v3.patch


 Subtask of HBase-5443, separate HRegionInterface into admin protocol and 
 client protocol, create the PB protocol buffer files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5619) Create PB protocols for HRegionInterface

2012-03-26 Thread Jimmy Xiang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5619:
---

Status: Patch Available  (was: Open)

 Create PB protocols for HRegionInterface
 

 Key: HBASE-5619
 URL: https://issues.apache.org/jira/browse/HBASE-5619
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5619.patch, hbase-5619_v3.patch


 Subtask of HBase-5443, separate HRegionInterface into admin protocol and 
 client protocol, create the PB protocol buffer files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5619) Create PB protocols for HRegionInterface

2012-03-26 Thread Jimmy Xiang (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-5619:
---

Attachment: hbase-5619_v3.patch

 Create PB protocols for HRegionInterface
 

 Key: HBASE-5619
 URL: https://issues.apache.org/jira/browse/HBASE-5619
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5619.patch, hbase-5619_v3.patch


 Subtask of HBase-5443, separate HRegionInterface into admin protocol and 
 client protocol, create the PB protocol buffer files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5626) Compactions simulator tool for proofing algorithms

2012-03-26 Thread Nicolas Spiegelberg (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238838#comment-13238838
 ] 

Nicolas Spiegelberg commented on HBASE-5626:


How is this different from the compaction simulation python script?  The unit 
of measurement should be a flush, since we flush after a certain memstore 
memory size, regardless of flow rate or KV length.

 Compactions simulator tool for proofing algorithms
 --

 Key: HBASE-5626
 URL: https://issues.apache.org/jira/browse/HBASE-5626
 Project: HBase
  Issue Type: Task
Reporter: stack
Priority: Minor
  Labels: noob

 A tool to run compaction simulations would be a nice to have.   We could use 
 it to see how well an algo ran under different circumstances loaded w/ 
 different value types with different rates of flushes and splits, etc. 
 HBASE-2462 had one (see in patch).  Or we could try doing it using something 
 like this: http://en.wikipedia.org/wiki/Discrete_event_simulation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5598) Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file

2012-03-26 Thread Uma Maheswara Rao G (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238849#comment-13238849
]

Uma Maheswara Rao G commented on HBASE-5598:

Also another case to use exclude filter is, we have many findbugs reported from
generated code (from thrift).

We can just add that package to the filter file.

ex:
{code}
FindBugsFilter
Match
Package name=org.apache.hadoop.hbase.thrift.generated /
/Match
/FindBugsFilter
{code}

Analyse and fix the findbugs reporting by QA and add invalid bugs into
findbugs-excludeFilter file
--

[jira] [Commented] (HBASE-5626) Compactions simulator tool for proofing algorithms


[ 
https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238853#comment-13238853
 ] 

stack commented on HBASE-5626:
--

Where is the python simulation script?  Is it uploaded anywhere?  (Pardon me if 
I missed it)

Simulator needs to also factor in splitting.

 Compactions simulator tool for proofing algorithms
 --

 Key: HBASE-5626
 URL: https://issues.apache.org/jira/browse/HBASE-5626
 Project: HBase
  Issue Type: Task
Reporter: stack
Priority: Minor
  Labels: noob

 A tool to run compaction simulations would be a nice to have.   We could use 
 it to see how well an algo ran under different circumstances loaded w/ 
 different value types with different rates of flushes and splits, etc. 
 HBASE-2462 had one (see in patch).  Or we could try doing it using something 
 like this: http://en.wikipedia.org/wiki/Discrete_event_simulation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5544) Add metrics to HRegion.processRow()

[
https://issues.apache.org/jira/browse/HBASE-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238856#comment-13238856
]

Hadoop QA commented on HBASE-5544:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12519753/HBASE-5544.D2457.2.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.replication.TestReplication
org.apache.hadoop.hbase.util.TestHBaseFsck

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1311//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1311//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1311//console

This message is automatically generated.

Add metrics to HRegion.processRow()
---

Key: HBASE-5544
URL: https://issues.apache.org/jira/browse/HBASE-5544
Project: HBase
Issue Type: New Feature
Reporter: Scott Chen
Assignee: Scott Chen
Fix For: 0.96.0

Attachments: HBASE-5544.D2457.1.patch, HBASE-5544.D2457.2.patch

Add metrics of
1. time for waiting for the lock
2. processing time (scan time)
3. time spent while holding the lock
4. total call time
5. number of failures / calls

[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost

2012-03-26 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238861#comment-13238861
]

Hadoop QA commented on HBASE-5606:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12520003/0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks
org.apache.hadoop.hbase.mapreduce.TestImportTsv
org.apache.hadoop.hbase.mapred.TestTableMapReduce
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
org.apache.hadoop.hbase.master.TestSplitLogManager

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/1310//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/1310//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1310//console

This message is automatically generated.

SplitLogManger async delete node hangs log splitting when ZK connection is
lost

Key: HBASE-5606
URL: https://issues.apache.org/jira/browse/HBASE-5606
Project: HBase
Issue Type: Bug
Components: wal
Affects Versions: 0.92.0
Reporter: Gopinathan A
Priority: Critical
Fix For: 0.92.2

Attachments:
0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 5606.txt

1. One rs died, the servershutdownhandler found it out and started the
distributed log splitting;
2. All tasks are failed due to ZK connection lost, so the all the tasks were
deleted asynchronously;
3. Servershutdownhandler retried the log splitting;
4. The asynchronously deletion in step 2 finally happened for new task
5. This made the SplitLogManger in hanging state.
This leads to .META. region not assigened for long time
{noformat}
hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14
19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up
splitlog task at znode
/hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14
19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up
splitlog task at znode
/hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
{noformat}
{noformat}
hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14
19:34:31,196 DEBUG
org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted
/hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14
19:34:32,497 DEBUG
org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted
/hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
{noformat}

[jira] [Updated] (HBASE-5618) SplitLogManager - prevent unnecessary attempts to resubmits

2012-03-26 Thread Prakash Khemani (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-5618:
---

Attachment: 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch

update heartbeat time as soon as possible and as often as one can.

 SplitLogManager - prevent unnecessary attempts to resubmits
 ---

 Key: HBASE-5618
 URL: https://issues.apache.org/jira/browse/HBASE-5618
 Project: HBase
  Issue Type: Improvement
  Components: wal, zookeeper
Reporter: Prakash Khemani
 Attachments: 
 0001-HBASE-5618-SplitLogManager-prevent-unnecessary-attem.patch


 Currently once a watch fires that the task node has been updated (hearbeated) 
 by the worker, the splitlogmanager still quite some time before it updates 
 the last heard from time. This is because the manager currently schedules 
 another getDataSetWatch() and only after that finishes will it update the 
 task's last heard from time.
 This leads to a large number of zk-BadVersion warnings when resubmission is 
 continuously attempted and it fails.
 Two changes should be made
 (1) On a resubmission failure because of BadVersion the task's lastUpdate 
 time should get upped.
 (2) The task's lastUpdate time should get upped as soon as the 
 nodeDataChanged() watch fires and without waiting for getDataSetWatch() to 
 complete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs


[ 
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238893#comment-13238893
 ] 

jirapos...@reviews.apache.org commented on HBASE-5451:
--



bq.  On 2012-03-24 07:38:03, Benoit Sigoure wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java,
 line 102
bq.   https://reviews.apache.org/r/4096/diff/2/?file=86903#file86903line102
bq.  
bq.   Argh, no, don't change this!  I got other HBase devs to promise to 
not change this as it makes backwards compatible clients impossibly complicated.

I see. This was the basis of the graceful failure for current clients that 
are not aware of PB (clients would bail out if the versions of RPC don't match, 
right). The response to your comment below I don't see how this is graceful. 
is actually this change in the version.


bq.  On 2012-03-24 07:38:03, Benoit Sigoure wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/proto/RPCMessageProto.proto,
 line 34
bq.   https://reviews.apache.org/r/4096/diff/2/?file=86905#file86905line34
bq.  
bq.   Why keep this oddity of Hadoop RPC?  Either rely on TCP keepalive, 
or add a Ping method to the RPC interface.

Note that this is just documentation. Ping is already done in hbase RPC, and I 
thought I'd document it. I haven't done anything in the PB stuff for handling 
this. I agree with you this is odd/special-case and IMO a topic for a separate 
jira.


bq.  On 2012-03-24 07:38:03, Benoit Sigoure wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/proto/RPCMessageProto.proto,
 line 72
bq.   https://reviews.apache.org/r/4096/diff/2/?file=86905#file86905line72
bq.  
bq.   Why is this optional?

General comment on the optional vs required PB fields... I have made most of 
the fields as optional since it makes the specification flexible and makes 
compatibility very easy. Once we are somewhat certain of the PB fields in the 
RPC we can finalize on the labeling of optional/required on the fields. Does 
this make sense?


bq.  On 2012-03-24 07:38:03, Benoit Sigoure wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/proto/RPCMessageProto.proto,
 line 71
bq.   https://reviews.apache.org/r/4096/diff/2/?file=86905#file86905line71
bq.  
bq.   What's the point of this message?  Why not just put the callId in 
RpcRequestProto and be done with it?

The main reason being I wanted to clearly separate what comes from the 
application and what's put in by the RPC layer. The client would frame a PB 
object (RpcRequestProto) and send it down to the RPC layer. Currently, the 
RpcRequestProto is mostly a placeholder with only one field called 'bytes'. 
Once I implement the ProtoBufRpcEngine (as in Hadoop core) in a follow-up jira, 
I'll have fields like methodname', 'protocolname', etc. and they would be 
encoded as RpcRequestProto objects.

Similarly, on the response side.


bq.  On 2012-03-24 07:38:03, Benoit Sigoure wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/proto/RPCMessageProto.proto,
 line 25
bq.   https://reviews.apache.org/r/4096/diff/2/?file=86905#file86905line25
bq.  
bq.   I don't see how this is graceful.

I answered this above.


- Devaraj


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4096/#review6302
---


On 2012-03-01 03:40:14, Devaraj Das wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4096/
bq.  ---
bq.  
bq.  (Updated 2012-03-01 03:40:14)
bq.  
bq.  
bq.  Review request for .
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Switch RPC call envelope/headers to PBs
bq.  
bq.  
bq.  This addresses bug HBASE-5451.
bq.  https://issues.apache.org/jira/browse/HBASE-5451
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.http://svn.apache.org/repos/asf/hbase/trunk/pom.xml 1294899 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/DataOutputOutputStream.java
 1294899 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
 1294899 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
 1294899 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/security/User.java
 1294899 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/proto/RPCMessageProto.proto
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/4096/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Devaraj
bq.  
bq.

[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-03-26 Thread Ted Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238891#comment-13238891
 ] 

Ted Yu commented on HBASE-2600:
---

@Alex:
Can you attach hbase-2600-root.dir.tgz to this JIRA ?
Please briefly describe how you generated the tar ball.

Thanks

 Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
 tablename+ENDROW+randomid
 

 Key: HBASE-2600
 URL: https://issues.apache.org/jira/browse/HBASE-2600
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Alex Newman
 Attachments: 
 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v4.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v6.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8.1, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v9.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch, 
 2600-trunk-01-17.txt, HBASE-2600+5217-Sun-Mar-25-2012-v3.patch, 
 HBASE-2600+5217-Sun-Mar-25-2012-v4.patch, jenkins.pdf


 This is an idea that Ryan and I have been kicking around on and off for a 
 while now.
 If regionnames were made of tablename+endrow instead of tablename+startrow, 
 then in the metatables, doing a search for the region that contains the 
 wanted row, we'd just have to open a scanner using passed row and the first 
 row found by the scan would be that of the region we need (If offlined 
 parent, we'd have to scan to the next row).
 If we redid the meta tables in this format, we'd be using an access that is 
 natural to hbase, a scan as opposed to the perverse, expensive 
 getClosestRowBefore we currently have that has to walk backward in meta 
 finding a containing region.
 This issue is about changing the way we name regions.
 If we were using scans, prewarming client cache would be near costless (as 
 opposed to what we'll currently have to do which is first a 
 getClosestRowBefore and then a scan from the closestrowbefore forward).
 Converting to the new method, we'd have to run a migration on startup 
 changing the content in meta.
 Up to this, the randomid component of a region name has been the timestamp of 
 region creation.   HBASE-2531 32-bit encoding of regionnames waaay 
 too susceptible to hash clashes proposes changing the randomid so that it 
 contains actual name of the directory in the filesystem that hosts the 
 region.  If we had this in place, I think it would help with the migration to 
 this new way of doing the meta because as is, the region name in fs is a hash 
 of regionname... changing the format of the regionname would mean we generate 
 a different hash... so we'd need hbase-2531 to be in place before we could do 
 this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5626) Compactions simulator tool for proofing algorithms

2012-03-26 Thread Nicolas Spiegelberg (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg updated HBASE-5626:
---

Attachment: cf_compact.py

Attached the current python script that I use to emulate compactions given 
different params.

 Compactions simulator tool for proofing algorithms
 --

 Key: HBASE-5626
 URL: https://issues.apache.org/jira/browse/HBASE-5626
 Project: HBase
  Issue Type: Task
Reporter: stack
Priority: Minor
  Labels: noob
 Attachments: cf_compact.py


 A tool to run compaction simulations would be a nice to have.   We could use 
 it to see how well an algo ran under different circumstances loaded w/ 
 different value types with different rates of flushes and splits, etc. 
 HBASE-2462 had one (see in patch).  Or we could try doing it using something 
 like this: http://en.wikipedia.org/wiki/Discrete_event_simulation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5626) Compactions simulator tool for proofing algorithms

2012-03-26 Thread Nicolas Spiegelberg (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238907#comment-13238907
]

Nicolas Spiegelberg commented on HBASE-5626:

A little more explanation.

Basic Concept:
We wish to model the amount of compaction IO and file dispersion. The unit of
measurement for compactions is a flush. This is because a flush is always 64MB
(or whatever you configure) regardless of other properties about the CF/KV.
Column families might trigger flushes at different intervals, but they usually
flush a consistent amount of data. You can understand the behavior of a
compaction algorithm based upon how it behaves over X amount of flushes. Does
this test make a lot of assumptions and simplifications? Yes!

Inputs:
1. ratio = compaction.ratio between files. (same as the HBase config)
2. min.files = minimum count of files that must be selected for a compaction to
occur (same as HBase config)
3. duplication = percentage of KVs within a file that are mutations and will be
deduped on compaction (0 = DUPLICATION = 1)
4. iterations = number of flushes to simulate

Output:
1. The StoreFile dispersion after every flush (and, possibly, compaction
triggered by that flush)
2. The average storefile count over iterations flushes
3. The amount of IO consumed by compactions after those iterations flushes.

Compactions simulator tool for proofing algorithms
--

Key: HBASE-5626
URL: https://issues.apache.org/jira/browse/HBASE-5626
Project: HBase
Issue Type: Task
Reporter: stack
Priority: Minor
Labels: noob
Attachments: cf_compact.py

A tool to run compaction simulations would be a nice to have. We could use
it to see how well an algo ran under different circumstances loaded w/
different value types with different rates of flushes and splits, etc.
HBASE-2462 had one (see in patch). Or we could try doing it using something
like this: http://en.wikipedia.org/wiki/Discrete_event_simulation

[jira] [Commented] (HBASE-5626) Compactions simulator tool for proofing algorithms


[ 
https://issues.apache.org/jira/browse/HBASE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238908#comment-13238908
 ] 

stack commented on HBASE-5626:
--

Nice. Let me take a looksee...

 Compactions simulator tool for proofing algorithms
 --

 Key: HBASE-5626
 URL: https://issues.apache.org/jira/browse/HBASE-5626
 Project: HBase
  Issue Type: Task
Reporter: stack
Priority: Minor
  Labels: noob
 Attachments: cf_compact.py


 A tool to run compaction simulations would be a nice to have.   We could use 
 it to see how well an algo ran under different circumstances loaded w/ 
 different value types with different rates of flushes and splits, etc. 
 HBASE-2462 had one (see in patch).  Or we could try doing it using something 
 like this: http://en.wikipedia.org/wiki/Discrete_event_simulation

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush

2012-03-26 Thread Uma Maheswara Rao G (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238910#comment-13238910
 ] 

Hudson commented on HBASE-5623:
---

Integrated in HBase-0.94 #57 (See 
[https://builds.apache.org/job/HBase-0.94/57/])
HBASE-5623 Race condition when rolling the HLog and hlogFlush (Enis 
Soztutar and LarsH) (Revision 1305549)

 Result = SUCCESS
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollingNoCluster.java


 Race condition when rolling the HLog and hlogFlush
 --

 Key: HBASE-5623
 URL: https://issues.apache.org/jira/browse/HBASE-5623
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 
 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, 
 HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch


 When doing a ycsb test with a large number of handlers 
 (regionserver.handler.count=60), I get the following exceptions:
 {code}
 Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152)
   at $Proxy1.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214)
 {code}
 and 
 {code}
   java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
 {code}
 It seems the root cause of the issue is that we open a new log writer and 
 close the old one at HLog#rollWriter() holding the updateLock, but the other 
 threads doing syncer() calls
 {code}

[jira] [Updated] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

2012-03-26 Thread Alex Newman (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-2600:
---

Attachment: hbase-2600-root.dir.tgz

generate-hbase-2600-root-in-tmp.sh  was used to generate this tarball

 Change how we do meta tables; from tablename+STARTROW+randomid to instead, 
 tablename+ENDROW+randomid
 

 Key: HBASE-2600
 URL: https://issues.apache.org/jira/browse/HBASE-2600
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Alex Newman
 Attachments: 
 0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v4.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v6.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8.1, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v9.patch, 
 0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch, 
 2600-trunk-01-17.txt, HBASE-2600+5217-Sun-Mar-25-2012-v3.patch, 
 HBASE-2600+5217-Sun-Mar-25-2012-v4.patch, hbase-2600-root.dir.tgz, jenkins.pdf


 This is an idea that Ryan and I have been kicking around on and off for a 
 while now.
 If regionnames were made of tablename+endrow instead of tablename+startrow, 
 then in the metatables, doing a search for the region that contains the 
 wanted row, we'd just have to open a scanner using passed row and the first 
 row found by the scan would be that of the region we need (If offlined 
 parent, we'd have to scan to the next row).
 If we redid the meta tables in this format, we'd be using an access that is 
 natural to hbase, a scan as opposed to the perverse, expensive 
 getClosestRowBefore we currently have that has to walk backward in meta 
 finding a containing region.
 This issue is about changing the way we name regions.
 If we were using scans, prewarming client cache would be near costless (as 
 opposed to what we'll currently have to do which is first a 
 getClosestRowBefore and then a scan from the closestrowbefore forward).
 Converting to the new method, we'd have to run a migration on startup 
 changing the content in meta.
 Up to this, the randomid component of a region name has been the timestamp of 
 region creation.   HBASE-2531 32-bit encoding of regionnames waaay 
 too susceptible to hash clashes proposes changing the randomid so that it 
 contains actual name of the directory in the filesystem that hosts the 
 region.  If we had this in place, I think it would help with the migration to 
 this new way of doing the meta because as is, the region name in fs is a hash 
 of regionname... changing the format of the regionname would mean we generate 
 a different hash... so we'd need hbase-2531 to be in place before we could do 
 this change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5598) Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file

2012-03-26 Thread Uma Maheswara Rao G (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HBASE-5598:
---

Attachment: HBASE-5598.patch

 Analyse and fix the findbugs reporting by QA and add invalid bugs into 
 findbugs-excludeFilter file
 --

 Key: HBASE-5598
 URL: https://issues.apache.org/jira/browse/HBASE-5598
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
 Attachments: HBASE-5598.patch


 There are many findbugs errors reporting by HbaseQA. HBASE-5597 is going to 
 up the OK count.
 This may lead to other issues when we re-factor the code, if we induce new 
 valid ones and remove invalid bugs also can not be reported by QA.
 So, I would propose to add the exclude filter file for findbugs(for the 
 invalid bugs). If we find any valid ones, we can fix under this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost

2012-03-26 Thread Ted Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-5606:
--

Attachment: 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch

Re-attaching Prakash's patch.

 SplitLogManger async delete node hangs log splitting when ZK connection is 
 lost 
 

 Key: HBASE-5606
 URL: https://issues.apache.org/jira/browse/HBASE-5606
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Gopinathan A
Priority: Critical
 Fix For: 0.92.2

 Attachments: 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch


 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All tasks are failed due to ZK connection lost, so the all the tasks were 
 deleted asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. The asynchronously deletion in step 2 finally happened for new task
 5. This made the SplitLogManger in hanging state.
 This leads to .META. region not assigened for long time
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 
 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 
 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 
 19:34:31,196 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 
 19:34:32,497 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5598) Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file


[ 
https://issues.apache.org/jira/browse/HBASE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238924#comment-13238924
 ] 

Uma Maheswara Rao G commented on HBASE-5598:


As a initial start I updated the patch with filter file.

1) added filter file
2) excluded generated packges.
   org.apache.hadoop.hbase.thrift2.generated, 
org.apache.hadoop.hbase.thrift.generated, 
org.apache.hadoop.hbase.rest.protobuf.generated

This reduces the findbugs count from 772 to 601.

{quote}
[WARNING]
[INFO]
[INFO] 
[INFO] Building HBase 0.95-SNAPSHOT
[INFO] 
[INFO]
[INFO] --- findbugs-maven-plugin:2.4.0:findbugs (default-cli) @ hbase ---
[INFO] Fork Value is true
 [java] Warnings generated: 601
[INFO] Done FindBugs Analysis
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 3:19.270s
[INFO] Finished at: Tue Mar 27 03:56:20 IST 2012
[INFO] Final Memory: 13M/55M
[INFO] 
{quote}

note: currently we have already increased the count of findbugs in 
test-patch.properties. While verifying, I just reverted back to 0 for testing.


 Analyse and fix the findbugs reporting by QA and add invalid bugs into 
 findbugs-excludeFilter file
 --

 Key: HBASE-5598
 URL: https://issues.apache.org/jira/browse/HBASE-5598
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
 Attachments: HBASE-5598.patch


 There are many findbugs errors reporting by HbaseQA. HBASE-5597 is going to 
 up the OK count.
 This may lead to other issues when we re-factor the code, if we induce new 
 valid ones and remove invalid bugs also can not be reported by QA.
 So, I would propose to add the exclude filter file for findbugs(for the 
 invalid bugs). If we find any valid ones, we can fix under this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4348) Add metrics for regions in transition

2012-03-26 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238923#comment-13238923
 ] 

jirapos...@reviews.apache.org commented on HBASE-4348:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4402/#review6204
---


Himanshu, please address the potential NPE issue.  I've added some suggestions 
to keep names consistent with HBase's conventions.

It would be really nice if you could do a test that would exercise some of the 
new code (test updates don't really seem do it).  See TestRpcMetrics, or 
TestMetricsMBeanBase as possible modes.  I won't block committing if this 
doesn't happen, but it would be nice. :)


src/main/jamon/org/apache/hadoop/hbase/tmpl/master/AssignmentManagerStatusTmpl.jamon
https://reviews.apache.org/r/4402/#comment13873

let's rename this to be consistent with other Configuration fields.  Check 
out HConstants.java to see the names of quite a few configuration variables to 
get a general idea of the pattern.

My suggestion is something like:
'hbase.metrics.rit.threshold.time'



src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
https://reviews.apache.org/r/4402/#comment13857

Either:

* javadoc to say this must not be null and add 
'Preconditions.assertNotNull(masterMetrics,master metrics should never be 
null) on initialization

* add guards where masterMetrics is deref'ed to see if null.

Seems like with your tests, adding the guard 'if !=null' guard to 
masterMetrics derefs in the next comment is easier.



src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
https://reviews.apache.org/r/4402/#comment13874

ditto.  Actually, since it is used in a few places, you should probably to 
add this to the HConstants file.



src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
https://reviews.apache.org/r/4402/#comment13858

master metrics could npe here.



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
https://reviews.apache.org/r/4402/#comment13875

nit: spitting? (kind gross) maybe emitting (you use that word below) or 
publishing?



src/main/java/org/apache/hadoop/hbase/master/HMaster.java
https://reviews.apache.org/r/4402/#comment13363

nit: funny spacing



src/main/java/org/apache/hadoop/hbase/master/metrics/MasterMetrics.java
https://reviews.apache.org/r/4402/#comment13876

maybe rename to put rit in front so that it is consistent and will sort 
nicely?

maybe 'ritOldestAge'?


- jmhsieh


On 2012-03-20 21:44:10, Himanshu Vashishtha wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4402/
bq.  ---
bq.  
bq.  (Updated 2012-03-20 21:44:10)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch is for adding Region in transition metrics to the HMaster 
metrics system. It also adds these metrics in the master ui, in the Region in 
transition section. I have attached the proposed new format in the jira 4348.
bq.  
bq.  
bq.  This addresses bug HBase-4348.
bq.  https://issues.apache.org/jira/browse/HBase-4348
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
src/main/jamon/org/apache/hadoop/hbase/tmpl/master/AssignmentManagerStatusTmpl.jamon
 0dc0691 
bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
ae468ca 
bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java c4b4d30 
bq.src/main/java/org/apache/hadoop/hbase/master/metrics/MasterMetrics.java 
83abc52 
bq.src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java 
d68ce33 
bq.  
bq.  Diff: https://reviews.apache.org/r/4402/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran on a 5 node cluster and kill region servers randomly to observe the 
changes in the RIT metrics as emitted out by the Master's mxbean;
bq.  
bq.  mvn test passes without any failure.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Himanshu
bq.  
bq.



 Add metrics for regions in transition
 -

 Key: HBASE-4348
 URL: https://issues.apache.org/jira/browse/HBASE-4348
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Himanshu Vashishtha
Priority: Minor
  Labels: noob
 Attachments: 4348-metrics-v3.patch, 4348-v1.patch, 4348-v2.patch, 
 RITs.png, RegionInTransitions2.png, metrics-v2.patch


 The following metrics would be

[jira] [Updated] (HBASE-5598) Analyse and fix the findbugs reporting by QA and add invalid bugs into findbugs-excludeFilter file

2012-03-26 Thread Uma Maheswara Rao G (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HBASE-5598:
---

Status: Patch Available  (was: Open)

 Analyse and fix the findbugs reporting by QA and add invalid bugs into 
 findbugs-excludeFilter file
 --

 Key: HBASE-5598
 URL: https://issues.apache.org/jira/browse/HBASE-5598
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.92.1, 0.94.0, 0.96.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
 Attachments: HBASE-5598.patch


 There are many findbugs errors reporting by HbaseQA. HBASE-5597 is going to 
 up the OK count.
 This may lead to other issues when we re-factor the code, if we induce new 
 valid ones and remove invalid bugs also can not be reported by QA.
 So, I would propose to add the exclude filter file for findbugs(for the 
 invalid bugs). If we find any valid ones, we can fix under this JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush

2012-03-26 Thread Enis Soztutar (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238938#comment-13238938
 ] 

Enis Soztutar commented on HBASE-5623:
--

Thanks Lars for pushing this. Just as a note, I just tested the final version 
of the patch on a 4 node cluster with ycsb -threads 100. No problems.  

 Race condition when rolling the HLog and hlogFlush
 --

 Key: HBASE-5623
 URL: https://issues.apache.org/jira/browse/HBASE-5623
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 
 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, 
 HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch


 When doing a ycsb test with a large number of handlers 
 (regionserver.handler.count=60), I get the following exceptions:
 {code}
 Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152)
   at $Proxy1.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214)
 {code}
 and 
 {code}
   java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
 {code}
 It seems the root cause of the issue is that we open a new log writer and 
 close the old one at HLog#rollWriter() holding the updateLock, but the other 
 threads doing syncer() calls
 {code} 
 logSyncerThread.hlogFlush(this.writer);
 {code}
 without holding the updateLock. LogSyncer only synchronizes against 
 concurrent appends and flush(), but not on the passed writer, which can be 
 closed already by rollWriter(). In this case, since 
 SequenceFile#Writer.close() sets it's out field as null, we get the NPE. 

--
This message is automatically generated by JIRA.
If you

[jira] [Commented] (HBASE-4676) Prefix Compression - Trie data block encoding

2012-03-26 Thread Matt Corgan (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238948#comment-13238948
]

Matt Corgan commented on HBASE-4676:

{quote}Do we not have this now Matt?{quote}
I don't believe so. Right now, HFileReaderV2.blockSeek(..) does a brute force
loop through each KV in the block and does a full KV comparison at each KV.
Say there are 1000 KVs in the block and 20/row, on average you will have to do
500 full KV comparisons just to get to your key, and even if you are selecting
all 20 KVs in the row you still do 500 comparisons on average to get to the
start of the row. It's also very important to remember that these are long
keys and they are most likely to share a common prefix since they got sorted to
the same block, so you're comparators are churning the same prefix bytes over
and over.

{quote}The slow write and scan speeds would tend to rule it out for anything
but a random read workload but when random reading only{quote}
Yeah, it's not really targeted at long scans on cold data, but there are some
cases in between long cold scans and hot point queries. Some factors: If it's
warm data your block cache is now 6x bigger. Then there are many scans that
are really sub-block prefix scans to get all the keys in a row (the 20 of 1000
cells i mentioned above). If your scan will have a low hit ratio then the
ability to jump between rows inside a block will lessen CPU usage.

It performs best when doing lots of random Gets on hot data (in block cache).
Possibly a persistent memcached alternative, especially with SSDs. I believe
the current system is actually limited by CPU when doing high throughput Gets
on data with long keys and short values. The trie's individual request latency
may not be much different, but a single server could serve many more parallel
requests before maxing out cpu.

The bigger the value/key ratio of your data the smaller the difference between
trie and anything else. Seems like many people now have big values so I doubt
they would see a difference. I'm more trying to enable smarter schemas with
compound primary keys.

{quote}Regards your working set read testing, did it all fit in memory?{quote}
yes, i'm trying to do the purest comparison of cpu usage possible. leaving it
up to others to extrapolate the results to what happens with the effectively
bigger block cache, more rows/block fetched from disk for equivalent size
block, etc. i'm currently just standing up a StoreFile and using it directly.
see
https://github.com/hotpads/hbase-prefix-trie/blob/hcell-scanners/test/org/apache/hadoop/hbase/cell/pt/test/performance/seek/SeekBenchmarkMain.java.
i'll try to factor in network and disk latencies in later tests (did some
preliminary tests friday but am on vacation this week).

{quote}How did you measure cycles per seek?{quote}
simple assumption of 2 billion/second. was just trying to emphasize how many
cycles a seek currently takes

{quote}What is numBlocks? The total number of blocks the dataset fit in?{quote}
number of data blocks in the HFile i fed in

Prefix Compression - Trie data block encoding
-

[jira] [Updated] (HBASE-5596) Few minor bugs from HBASE-5209


 [ 
https://issues.apache.org/jira/browse/HBASE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-5596:
--

   Resolution: Fixed
Fix Version/s: 0.96.0
   0.94.0
   0.92.2
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 Few minor bugs from HBASE-5209
 --

 Key: HBASE-5596
 URL: https://issues.apache.org/jira/browse/HBASE-5596
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.1, 0.94.0
Reporter: David S. Wang
Assignee: David S. Wang
Priority: Minor
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5596.patch, hbase-5596-0.94.patch


 A few leftover bugs from HBASE-5209.  Comments are documented here:
 https://reviews.apache.org/r/3892/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4993) Performance regression in minicluster creation

2012-03-26 Thread Jean-Daniel Cryans (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238962#comment-13238962
 ] 

Jean-Daniel Cryans commented on HBASE-4993:
---

It seems this would work better, also reviewing the 0.90/0.92 code it think we 
should keep the new logic you introduced in this jira (with the fixed code). I 
opened HBASE-5639 and assigned it to you.

 Performance regression in minicluster creation
 --

 Key: HBASE-4993
 URL: https://issues.apache.org/jira/browse/HBASE-4993
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.94.0

 Attachments: 4993.patch, 4993.v3.patch


 Side effect of 4610: the mini cluster needs 4,5 seconds to start

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5639) The logic used in waiting for region servers during startup is broken

2012-03-26 Thread Jean-Daniel Cryans (Created) (JIRA)

The logic used in waiting for region servers during startup is broken
-

 Key: HBASE-5639
 URL: https://issues.apache.org/jira/browse/HBASE-5639
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: nkeywal
Priority: Blocker
 Fix For: 0.94.0


See the tail of HBASE-4993, which I'll report here:

Me:
{quote}
I think a bug was introduced here. Here's the new waiting logic in 
waitForRegionServers:

the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
   there have been no new region server in for
  'hbase.master.wait.on.regionservers.interval' time

And the code that verifies that:

!(lastCountChange+interval  now  count = minToStart)
{quote}

Nic:
{quote}
It seems that changing the code to

(count  minToStart ||
lastCountChange+interval  now)

would make the code works as documented.
If you have 0 region servers that checked in and you are under the interval, 
you wait: (true or true) = true.
If you have 0 region servers but you are above the interval, you wait: (true or 
false) = true.
If you have 1 or more region servers that checked in and you are under the 
interval, you wait: (false or true) = true.
{quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode

2012-03-26 Thread Jesse Yates (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238965#comment-13238965
 ] 

Jesse Yates commented on HBASE-5547:


Thoughts on how long we should keep around files? Indefinitely? The latter 
seems a bit excessive, especially if a 'backup mode' ensures you run every X 
minutes (and exports to another cluster, moves the files to another backup 
directory). 'Cleanup' in implies you want to remove the file when no one care 
about the hfiles anymore - thinking maybe a periodic chore on the rs?

With snapshots, I was expecting to add an file reference feature - essentially 
doing impl hardlinks for files we care about keeping around. Was thinking we 
could add a CP hook and impl that would let you add a checks (config based?) 
for if you want to keep a reference around for the file being cleaned up. In 
the backup situation, you would have a timer or (maybe check for a backup 
completed file/meta row) and see if you had elapsed that time or not; if not, 
you would add a reference, if so, do nothing and let the file get cleaned up.

 Don't delete HFiles when in backup mode
 -

 Key: HBASE-5547
 URL: https://issues.apache.org/jira/browse/HBASE-5547
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl

 This came up in a discussion I had with Stack.
 It would be nice if HBase could be notified that a backup is in progress (via 
 a znode for example) and in that case either:
 1. rename HFiles to be delete to file.bck
 2. rename the HFiles into a special directory
 3. rename them to a general trash directory (which would not need to be tied 
 to backup mode).
 That way it should be able to get a consistent backup based on HFiles (HDFS 
 snapshots or hard links would be better options here, but we do not have 
 those).
 #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush


[ 
https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238963#comment-13238963
 ] 

Lars Hofhansl commented on HBASE-5623:
--

Good to know :)

 Race condition when rolling the HLog and hlogFlush
 --

 Key: HBASE-5623
 URL: https://issues.apache.org/jira/browse/HBASE-5623
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 
 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, 
 HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch


 When doing a ycsb test with a large number of handlers 
 (regionserver.handler.count=60), I get the following exceptions:
 {code}
 Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152)
   at $Proxy1.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214)
 {code}
 and 
 {code}
   java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
 {code}
 It seems the root cause of the issue is that we open a new log writer and 
 close the old one at HLog#rollWriter() holding the updateLock, but the other 
 threads doing syncer() calls
 {code} 
 logSyncerThread.hlogFlush(this.writer);
 {code}
 without holding the updateLock. LogSyncer only synchronizes against 
 concurrent appends and flush(), but not on the passed writer, which can be 
 closed already by rollWriter(). In this case, since 
 SequenceFile#Writer.close() sets it's out field as null, we get the NPE. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:

[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode

2012-03-26 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238968#comment-13238968
]

Lars Hofhansl commented on HBASE-5547:
--

Was thinking that these files are up grabs unless HBase is in backup mode,
maybe by the existence of a specific zNode, although other models are possible
as well.
The details are a bit tricky, of course. Do we need a full cleanup between
backup runs so that we do not confuse the backup files? If not, do we tag with
a backup number or a with timestamp (like the HLogs)?

If we do HBASE-50 we won't need this one methinks. This might get us to
workable solution more quickly.

Don't delete HFiles when in backup mode
-

Key: HBASE-5547
URL: https://issues.apache.org/jira/browse/HBASE-5547
Project: HBase
Issue Type: New Feature
Reporter: Lars Hofhansl

This came up in a discussion I had with Stack.
It would be nice if HBase could be notified that a backup is in progress (via
a znode for example) and in that case either:
1. rename HFiles to be delete to file.bck
2. rename the HFiles into a special directory
3. rename them to a general trash directory (which would not need to be tied
to backup mode).
That way it should be able to get a consistent backup based on HFiles (HDFS
snapshots or hard links would be better options here, but we do not have
those).
#1 makes cleanup a bit harder.

[jira] [Commented] (HBASE-5619) Create PB protocols for HRegionInterface

[
https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238972#comment-13238972
]

jirapos...@reviews.apache.org commented on HBASE-5619:
--

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4054/#review6365
---

Excellent.

It looks like we can commit this w/o breaking whats currently there?

src/main/proto/RegionAdmin.proto
https://reviews.apache.org/r/4054/#comment13813

What did we figure on the package name? Shouldn't it agree w/ the dir that
holds the .proto files at src/main? Currently one is protobuf and the other is
proto.

src/main/proto/RegionAdmin.proto
https://reviews.apache.org/r/4054/#comment13835

What are these two booleans broken out? Aren't they in they attributes of
HRI already? Why repeat them?

src/main/proto/RegionAdmin.proto
https://reviews.apache.org/r/4054/#comment13836

Good. Its kinda dumb the way our HRegionInterface is now where it has an
override, one that takes single family and another that takes an array. Thanks
for collapsing.

src/main/proto/RegionAdmin.proto
https://reviews.apache.org/r/4054/#comment13837

Should we be repeating the API documentation that is up in the
HRegionInterface that this .proto replaces here? If its not here, where will
it be? Not all of the javadoc should make it over -- the stuff that says
nothing shouldn't but some is of worth. What you think?

src/main/proto/RegionAdmin.proto
https://reviews.apache.org/r/4054/#comment13838

Again, thanks for doing the work collapsing the overrides that are up in
HRegionInterface.

src/main/proto/RegionAdmin.proto
https://reviews.apache.org/r/4054/#comment13839

Isn't the response currently a void? And isnt' flush async (IIRC). If
so, under what circumstance would we be able to fill out this response?

src/main/proto/RegionAdmin.proto
https://reviews.apache.org/r/4054/#comment13842

WALKey maps to HLogKey? Maybe add a comment to this effect?

src/main/proto/RegionAdmin.proto
https://reviews.apache.org/r/4054/#comment13846

Good. I like this method name better. Should be a comment which points
back to the old name? Or what you think?

src/main/proto/RegionAdmin.proto
https://reviews.apache.org/r/4054/#comment13847

Yeah, this proto is missing commentary. I mean, the return here should be
explained?

src/main/proto/RegionAdmin.proto
https://reviews.apache.org/r/4054/#comment13850

This will for sure grow w/ time.

src/main/proto/RegionAdmin.proto
https://reviews.apache.org/r/4054/#comment13855

Nice. So you are splitting HRegionInterface into admin and client? Good.

src/main/proto/RegionClient.proto
https://reviews.apache.org/r/4054/#comment13862

This is new? Being able to do this? How will it be used?

src/main/proto/RegionClient.proto
https://reviews.apache.org/r/4054/#comment13863

This is new feature on get? Or just special handling of an attribute?

src/main/proto/RegionClient.proto
https://reviews.apache.org/r/4054/#comment13864

We don't have this in the java Result. I don't understand why this is
making its way into the object.

src/main/proto/RegionClient.proto
https://reviews.apache.org/r/4054/#comment13865

ditto

src/main/proto/RegionClient.proto
https://reviews.apache.org/r/4054/#comment13866

what is processed?

src/main/proto/RegionClient.proto
https://reviews.apache.org/r/4054/#comment13867

Why we need to add a region to the Get even optionally?

src/main/proto/RegionClient.proto
https://reviews.apache.org/r/4054/#comment13868

Why is the Get polluted by multiGet stuff?

src/main/proto/RegionClient.proto
https://reviews.apache.org/r/4054/#comment13869

I thought we could set this into the Get above? Why have it here as
separate param?

src/main/proto/RegionClient.proto
https://reviews.apache.org/r/4054/#comment13870

Good stuff

src/main/proto/RegionClient.proto
https://reviews.apache.org/r/4054/#comment13872

This is a new feature?

src/main/proto/RegionClient.proto
https://reviews.apache.org/r/4054/#comment13881

A type rather than have the mutation type specified as a subclass?

A mutate is a delete or put?

src/main/proto/RegionClient.proto
https://reviews.apache.org/r/4054/#comment13882

Again, does it make sense having this in here? I mean regions come and go
-- e.g. split -- so you could have reference to non-existent region. This
stuff of tying a mutation to a particular region should be done external to the
mutations themselves?

Is this trying to do multiaction? Maybe that should be a message apart
from mutate? The new message would have region name and the mutate?

[jira] [Commented] (HBASE-5639) The logic used in waiting for region servers during startup is broken

2012-03-26 Thread Jean-Daniel Cryans (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238973#comment-13238973
 ] 

Jean-Daniel Cryans commented on HBASE-5639:
---

Oh I forgot to mention that I'm marking this as a blocker for 0.94.0 because 
right now if you start a sizable cluster you may end up with region servers 
that checkin too late and miss the re-assignment of regions.

 The logic used in waiting for region servers during startup is broken
 -

 Key: HBASE-5639
 URL: https://issues.apache.org/jira/browse/HBASE-5639
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Assignee: nkeywal
Priority: Blocker
 Fix For: 0.94.0


 See the tail of HBASE-4993, which I'll report here:
 Me:
 {quote}
 I think a bug was introduced here. Here's the new waiting logic in 
 waitForRegionServers:
 the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
there have been no new region server in for
   'hbase.master.wait.on.regionservers.interval' time
 And the code that verifies that:
 !(lastCountChange+interval  now  count = minToStart)
 {quote}
 Nic:
 {quote}
 It seems that changing the code to
 (count  minToStart ||
 lastCountChange+interval  now)
 would make the code works as documented.
 If you have 0 region servers that checked in and you are under the interval, 
 you wait: (true or true) = true.
 If you have 0 region servers but you are above the interval, you wait: (true 
 or false) = true.
 If you have 1 or more region servers that checked in and you are under the 
 interval, you wait: (false or true) = true.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode


[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238977#comment-13238977
 ] 

stack commented on HBASE-5547:
--

bq. Thoughts on how long we should keep around files? Indefinitely? 

Yes. I'd think some other process other than hbase would be responsible for 
their cleanup.

 Don't delete HFiles when in backup mode
 -

 Key: HBASE-5547
 URL: https://issues.apache.org/jira/browse/HBASE-5547
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl

 This came up in a discussion I had with Stack.
 It would be nice if HBase could be notified that a backup is in progress (via 
 a znode for example) and in that case either:
 1. rename HFiles to be delete to file.bck
 2. rename the HFiles into a special directory
 3. rename them to a general trash directory (which would not need to be tied 
 to backup mode).
 That way it should be able to get a consistent backup based on HFiles (HDFS 
 snapshots or hard links would be better options here, but we do not have 
 those).
 #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2600) Change how we do meta tables; from tablename+STARTROW+randomid to instead, tablename+ENDROW+randomid

[
https://issues.apache.org/jira/browse/HBASE-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238990#comment-13238990
]

Hadoop QA commented on HBASE-2600:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12520024/hbase-2600-root.dir.tgz
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 patch. The patch command could not apply the patch.

Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/1315//console

This message is automatically generated.

Change how we do meta tables; from tablename+STARTROW+randomid to instead,
tablename+ENDROW+randomid

Key: HBASE-2600
URL: https://issues.apache.org/jira/browse/HBASE-2600
Project: HBase
Issue Type: Bug
Reporter: stack
Assignee: Alex Newman
Attachments:
0001-Changed-regioninfo-format-to-use-endKey-instead-of-s.patch,
0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v2.patch,
0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v4.patch,
0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v6.patch,
0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v7.2.patch,
0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8,
0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v8.1,
0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen-v9.patch,
0001-HBASE-2600.-Change-how-we-do-meta-tables-from-tablen.patch,
2600-trunk-01-17.txt, HBASE-2600+5217-Sun-Mar-25-2012-v3.patch,
HBASE-2600+5217-Sun-Mar-25-2012-v4.patch, hbase-2600-root.dir.tgz, jenkins.pdf

This is an idea that Ryan and I have been kicking around on and off for a
while now.
If regionnames were made of tablename+endrow instead of tablename+startrow,
then in the metatables, doing a search for the region that contains the
wanted row, we'd just have to open a scanner using passed row and the first
row found by the scan would be that of the region we need (If offlined
parent, we'd have to scan to the next row).
If we redid the meta tables in this format, we'd be using an access that is
natural to hbase, a scan as opposed to the perverse, expensive
getClosestRowBefore we currently have that has to walk backward in meta
finding a containing region.
This issue is about changing the way we name regions.
If we were using scans, prewarming client cache would be near costless (as
opposed to what we'll currently have to do which is first a
getClosestRowBefore and then a scan from the closestrowbefore forward).
Converting to the new method, we'd have to run a migration on startup
changing the content in meta.
Up to this, the randomid component of a region name has been the timestamp of
region creation. HBASE-2531 32-bit encoding of regionnames waaay
too susceptible to hash clashes proposes changing the randomid so that it
contains actual name of the directory in the filesystem that hosts the
region. If we had this in place, I think it would help with the migration to
this new way of doing the meta because as is, the region name in fs is a hash
of regionname... changing the format of the regionname would mean we generate
a different hash... so we'd need hbase-2531 to be in place before we could do
this change.

[jira] [Commented] (HBASE-5596) Few minor bugs from HBASE-5209


[ 
https://issues.apache.org/jira/browse/HBASE-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239002#comment-13239002
 ] 

Hudson commented on HBASE-5596:
---

Integrated in HBase-0.94-security #3 (See 
[https://builds.apache.org/job/HBase-0.94-security/3/])
HBASE-5596 Few minor bugs from HBASE-5209 (David S. Wang) (Revision 1305662)

 Result = ABORTED
jmhsieh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ServerName.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


 Few minor bugs from HBASE-5209
 --

 Key: HBASE-5596
 URL: https://issues.apache.org/jira/browse/HBASE-5596
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.1, 0.94.0
Reporter: David S. Wang
Assignee: David S. Wang
Priority: Minor
 Fix For: 0.92.2, 0.94.0, 0.96.0

 Attachments: HBASE-5596.patch, hbase-5596-0.94.patch


 A few leftover bugs from HBASE-5209.  Comments are documented here:
 https://reviews.apache.org/r/3892/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5533) Add more metrics to HBase


[ 
https://issues.apache.org/jira/browse/HBASE-5533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239000#comment-13239000
 ] 

Hudson commented on HBASE-5533:
---

Integrated in HBase-0.94-security #3 (See 
[https://builds.apache.org/job/HBase-0.94-security/3/])
HBASE-5533 Add more metrics to HBase (Shaneal Manek) (Revision 1305582)

 Result = ABORTED
larsh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/metrics/ExactCounterMetric.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/metrics/histogram
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/metrics/histogram/ExponentiallyDecayingSample.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/metrics/histogram/MetricsHistogram.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/metrics/histogram/Sample.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/metrics/histogram/Snapshot.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/metrics/histogram/UniformSample.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/Threads.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/metrics/TestExactCounterMetric.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/metrics/TestExponentiallyDecayingSample.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/metrics/TestMetricsHistogram.java


 Add more metrics to HBase
 -

 Key: HBASE-5533
 URL: https://issues.apache.org/jira/browse/HBASE-5533
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.2, 0.94.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Minor
 Fix For: 0.94.0, 0.96.0

 Attachments: BlockingQueueContention.java, HBASE-5533-0.92-v4.patch, 
 HBASE-5533-TRUNK-v6.patch, HBASE-5533-TRUNK-v6.patch, TimingOverhead.java, 
 hbase-5533-0.92.patch, hbase5533-0.92-v2.patch, hbase5533-0.92-v3.patch, 
 hbase5533-0.92-v5.patch, histogram_web_ui.png


 To debug/monitor production clusters, there are some more metrics I wish I 
 had available.
 In particular:
 - Although the average FS latencies are useful, a 'histogram' of recent 
 latencies (90% of reads completed in under 100ms, 99% in under 200ms, etc) 
 would be more useful
 - Similar histograms of latencies on common operations (GET, PUT, DELETE) 
 would be useful
 - Counting the number of accesses to each region to detect hotspotting
 - Exposing the current number of HLog files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5623) Race condition when rolling the HLog and hlogFlush


[ 
https://issues.apache.org/jira/browse/HBASE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239001#comment-13239001
 ] 

Hudson commented on HBASE-5623:
---

Integrated in HBase-0.94-security #3 (See 
[https://builds.apache.org/job/HBase-0.94-security/3/])
HBASE-5623 Race condition when rolling the HLog and hlogFlush (Enis 
Soztutar and LarsH) (Revision 1305549)

 Result = ABORTED
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogWriter.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollingNoCluster.java


 Race condition when rolling the HLog and hlogFlush
 --

 Key: HBASE-5623
 URL: https://issues.apache.org/jira/browse/HBASE-5623
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5623-suggestion.txt, 5623-v7.txt, 5623-v8.txt, 5623.txt, 
 5623v2.txt, HBASE-5623_v0.patch, HBASE-5623_v4.patch, HBASE-5623_v5.patch, 
 HBASE-5623_v6-alt.patch, HBASE-5623_v6-alt.patch


 When doing a ycsb test with a large number of handlers 
 (regionserver.handler.count=60), I get the following exceptions:
 {code}
 Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.getLength(SequenceFile.java:1099)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.getLength(SequenceFileLogWriter.java:314)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1291)
   at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1388)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:920)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:152)
   at $Proxy1.multi(Unknown Source)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1691)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1689)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:214)
 {code}
 and 
 {code}
   java.lang.NullPointerException
   at 
 org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1026)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1068)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1035)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:279)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.hlogFlush(HLog.java:1237)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1271)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1391)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:2192)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1985)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3400)
   at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:366)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1351)
 {code}
 It seems the root cause of the issue is that we open a new log writer and 
 close the old one at HLog#rollWriter() holding the updateLock, but the other 
 threads doing syncer() calls

[jira] [Commented] (HBASE-5190) Limit the IPC queue size based on calls' payload size


[ 
https://issues.apache.org/jira/browse/HBASE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238999#comment-13238999
 ] 

Hudson commented on HBASE-5190:
---

Integrated in HBase-0.94-security #3 (See 
[https://builds.apache.org/job/HBase-0.94-security/3/])
HBASE-5190 Limit the IPC queue size based on calls' payload size
   (Ted's addendum) (Revision 1305469)

 Result = ABORTED
jdcryans : 
Files : 
* 
/hbase/branches/0.94/security/src/main/java/org/apache/hadoop/hbase/ipc/SecureServer.java


 Limit the IPC queue size based on calls' payload size
 -

 Key: HBASE-5190
 URL: https://issues.apache.org/jira/browse/HBASE-5190
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.5
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.94.0, 0.96.0

 Attachments: 5190.addendum, HBASE-5190-v2.patch, HBASE-5190-v3.patch, 
 HBASE-5190.patch


 Currently we limit the number of calls in the IPC queue only on their count. 
 It used to be really high and was dropped down recently to num_handlers * 10 
 (so 100 by default) because it was easy to OOME yourself when huge calls were 
 being queued. It's still possible to hit this problem if you use really big 
 values and/or a lot of handlers, so the idea is that we should take into 
 account the payload size. I can see 3 solutions:
  - Do the accounting outside of the queue itself for all calls coming in and 
 out and when a call doesn't fit, throw a retryable exception.
  - Same accounting but instead block the call when it comes in until space is 
 made available.
  - Add a new parameter for the maximum size (in bytes) of a Call and then set 
 the size the IPC queue (in terms of the number of items) so that it could 
 only contain as many items as some predefined maximum size (in bytes) for the 
 whole queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5615) the master never does balance because of balancing the parent region


[ 
https://issues.apache.org/jira/browse/HBASE-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239003#comment-13239003
 ] 

Hudson commented on HBASE-5615:
---

Integrated in HBase-0.94-security #3 (See 
[https://builds.apache.org/job/HBase-0.94-security/3/])
HBASE-5615 the master never does balance because of balancing the parent 
region (Xufeng) (Revision 1305172)

 Result = ABORTED
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


 the master never does balance because of balancing the parent region
 

 Key: HBASE-5615
 URL: https://issues.apache.org/jira/browse/HBASE-5615
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.7
Reporter: xufeng
Assignee: xufeng
Priority: Critical
 Fix For: 0.90.7, 0.92.2, 0.94.0, 0.96.0

 Attachments: 5615-trunk.txt, HBASE-5615-90.patch, HBASE-5615.patch, 
 NoPatched-surefire-report-5615-90.html, Patched_surefire-report-5615-90.html


 the master never do balance becauseof when master do rebuildUserRegions()，it 
 will add the parent region into  AssignmentManager#servers,
 if balancer let the parent region to move,the parent will in RIT forever.thus 
 balance will never be executed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5209) HConnection/HMasterInterface should allow for way to get hostname of currently active master in multi-master HBase setup


[ 
https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13239008#comment-13239008
 ] 

Hudson commented on HBASE-5209:
---

Integrated in HBase-0.94 #58 (See 
[https://builds.apache.org/job/HBase-0.94/58/])
HBASE-5596 Few minor bugs from HBASE-5209 (David S. Wang) (Revision 1305662)

 Result = ABORTED
jmhsieh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ServerName.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


 HConnection/HMasterInterface should allow for way to get hostname of 
 currently active master in multi-master HBase setup
 

 Key: HBASE-5209
 URL: https://issues.apache.org/jira/browse/HBASE-5209
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.90.5, 0.92.0, 0.94.0
Reporter: Aditya Acharya
Assignee: David S. Wang
 Fix For: 0.92.1, 0.94.0

 Attachments: 5209.addendum, HBASE_5209_v5.diff


 I have a multi-master HBase set up, and I'm trying to programmatically 
 determine which of the masters is currently active. But the API does not 
 allow me to do this. There is a getMaster() method in the HConnection class, 
 but it returns an HMasterInterface, whose methods do not allow me to find out 
 which master won the last race. The API should have a 
 getActiveMasterHostname() or something to that effect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5596) Few minor bugs from HBASE-5209