[jira] [Commented] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

2011-11-22 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154933#comment-13154933
 ] 

nkeywal commented on HBASE-4832:


One little comment: there is a conflict between the timeout on the method 
(@Test(timeout=timeout)) and the timeout of the sleep (Thread.sleep(timeout)). 
As they're both set to the same value (30 seconds), it can be one or another so 
the failure analysis will be more complex. I think we can remove the timeout on 
the method, the test itself ensures that it won't last forever.

 TestRegionServerCoprocessorExceptionWithAbort fails if the region server 
 stops too fast
 ---

 Key: HBASE-4832
 URL: https://issues.apache.org/jira/browse/HBASE-4832
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: Eugene Koontz
Priority: Minor
 Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, 
 HBASE-4832.patch, HBASE-4832.patch


 The current implementation of HRegionServer#stop is
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 synchronized (this) {
   // Wakes run() if it is sleeping
   notifyAll(); // FindBugs NN_NAKED_NOTIFY
 }
   }
 {noformat}
 The notification is sent on the wrong object and does nothing. As a 
 consequence, the region server continues to sleep instead of waking up and 
 stopping immediately. A correct implementation is:
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 // Wakes run() if it is sleeping
 sleeper.skipSleepCycle();
   }
 {noformat}
 Then the region server stops immediately. This makes the region server stops 
 0,5s faster on average, which is quite useful for unit tests.
 However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
 not work.
 It likely because the code does no expect the region server to stop that fast.
 The exception is:
 {noformat}
 testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
   Time elapsed: 30.06 sec   ERROR!
 java.lang.Exception: test timed out after 3 milliseconds
   at java.lang.Throwable.fillInStackTrace(Native Method)
   at java.lang.Throwable.init(Throwable.java:196)
   at java.lang.Exception.init(Exception.java:41)
   at java.lang.InterruptedException.init(InterruptedException.java:48)
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
   at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354)
   at 

[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-22 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154952#comment-13154952
 ] 

Hudson commented on HBASE-4842:
---

Integrated in HBase-0.92-security #6 (See 
[https://builds.apache.org/job/HBase-0.92-security/6/])
HBASE-4842 [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java


 [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
 ---

 Key: HBASE-4842
 URL: https://issues.apache.org/jira/browse/HBASE-4842
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4, 0.92.0, 0.94.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 4842-v3.txt, hbase-4842-breaker.patch, hbase-4842.patch


 Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
 is intermittently failing.
 In the test, a region's assignment is purposely changed in META but not in 
 ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
 clean comes up with a new ZK assignment but with META still being 
 inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
 sometimes it moves to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4830) Regionserver BLOCKED on WAITING DFSClient$DFSOutputStream.waitForAckedSeqno running 0.20.205.0+

2011-11-22 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154951#comment-13154951
 ] 

Hudson commented on HBASE-4830:
---

Integrated in HBase-0.92-security #6 (See 
[https://builds.apache.org/job/HBase-0.92-security/6/])
HBASE-4830 Regionserver BLOCKED on WAITING 
DFSClient$DFSOutputStream.waitForAckedSeqno running 0.20.205.0+

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/bin/hbase


 Regionserver BLOCKED on WAITING DFSClient$DFSOutputStream.waitForAckedSeqno 
 running 0.20.205.0+
 ---

 Key: HBASE-4830
 URL: https://issues.apache.org/jira/browse/HBASE-4830
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: 4830-v2.txt, 4830.txt, 
 hbase-stack-regionserver-sv4r9s38.out


 Running 0.20.205.1 (I was not at tip of the branch) I ran into the following 
 hung regionserver:
 {code}
 regionserver7003.logRoller daemon prio=10 tid=0x7fd98028f800 nid=0x61af 
 in Object.wait() [0x7fd987bfa000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:485)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.waitForAckedSeqno(DFSClient.java:3606)
 - locked 0xf8656788 (a java.util.LinkedList)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3595)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3687)
 - locked 0xf8656458 (a 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3626)
 at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
 at 
 org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
 at 
 org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:966)
 - locked 0xf8655998 (a 
 org.apache.hadoop.io.SequenceFile$Writer)
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.close(SequenceFileLogWriter.java:214)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:791)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:578)
 - locked 0xc443deb0 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 Other threads are like this (here's a sample):
 {code}
 regionserver7003.logSyncer daemon prio=10 tid=0x7fd98025e000 nid=0x61ae 
 waiting for monitor entry [0x7fd987cfb000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1074)
 - waiting to lock 0xc443deb0 (a java.lang.Object)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1195)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1057)
 at java.lang.Thread.run(Thread.java:662)
 
 IPC Server handler 0 on 7003 daemon prio=10 tid=0x7fd98049b800 
 nid=0x61b8 waiting for monitor entry [0x7fd9872f1000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:1007)
 - waiting to lock 0xc443deb0 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1798)
 at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1668)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2980)
 at sun.reflect.GeneratedMethodAccessor636.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1325)
 {code}
 Looks like HDFS-1529?  (Todd?)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4120) isolation and allocation

2011-11-22 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154957#comment-13154957
 ] 

Hadoop QA commented on HBASE-4120:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12504706/TablePriority_v8.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -143 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 88 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.master.TestRestartCluster
  org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildBase
  org.apache.hadoop.hbase.master.TestMasterFailover
  
org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildOverlap

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/330//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/330//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/330//console

This message is automatically generated.

 isolation and allocation
 

 Key: HBASE-4120
 URL: https://issues.apache.org/jira/browse/HBASE-4120
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver
Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
Reporter: Liu Jia
Assignee: Liu Jia
 Fix For: 0.94.0

 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
 Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
 HBase_isolation_and_allocation_user_guide.pdf, 
 Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, 
 TablePriority_v8.patch, TablePriority_v8.patch, 
 TablePriority_v8_for_trunk.patch


 The HBase isolation and allocation tool is designed to help users manage 
 cluster resource among different application and tables.
 When we have a large scale of HBase cluster with many applications running on 
 it, there will be lots of problems. In Taobao there is a cluster for many 
 departments to test their applications performance, these applications are 
 based on HBase. With one cluster which has 12 servers, there will be only one 
 application running exclusively on this server, and many other applications 
 must wait until the previous test finished.
 After we add allocation manage function to the cluster, applications can 
 share the cluster and run concurrently. Also if the Test Engineer wants to 
 make sure there is no interference, he/she can move out other tables from 
 this group.
 In groups we use table priority to allocate resource, when system is busy; we 
 can make sure high-priority tables are not affected lower-priority tables
 Different groups can have different region server configurations, some groups 
 optimized for reading can have large block cache size, and others optimized 
 for writing can have large memstore size. 
 Tables and region servers can be moved easily between groups; after changing 
 the configuration, a group can be restarted alone instead of restarting the 
 whole cluster.
 git entry : https://github.com/ICT-Ope/HBase_allocation .
 We hope our work is helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4846) HBase can't find the native lib

2011-11-22 Thread Aaron Guo (Created) (JIRA)
HBase can't find the native lib
---

 Key: HBASE-4846
 URL: https://issues.apache.org/jira/browse/HBASE-4846
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 0.90.4
Reporter: Aaron Guo


the shell script:${HBASE_HOME}/bin/hbase load the hadoop native lib like this:

if [ -d /usr/lib/hadoop-0.20/lib/native/${JAVA_PLATFORM} ] ; then
  JAVA_LIBRARY_PATH=$(append_path ${JAVA_LIBRARY_PATH} 
/usr/lib/hadoop-0.20/lib/native/${JAVA_PLATFORM})
fi

It should work like this:

if [ -d ${HADOOP_HOME}/lib/native/${JAVA_PLATFORM} ] ; then
  JAVA_LIBRARY_PATH=$(append_path ${JAVA_LIBRARY_PATH} 
${HADOOP_HOME}/lib/native/${JAVA_PLATFORM})
fi

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-22 Thread gaojinchao (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4739:
--

Attachment: HBASE-4739_trial6.patch

Thanks for your review. Fixed all comments


 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, 
 HBASE-4739_trial.patch, HBASE-4739_trial6.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-22 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155048#comment-13155048
 ] 

ramkrishna.s.vasudevan commented on HBASE-4739:
---

+1 for HBASE-4739_trial6.patch.  The state name change is to indicate that the 
node is created by master.  




 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, 
 HBASE-4739_trial.patch, HBASE-4739_trial6.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-22 Thread gaojinchao (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155050#comment-13155050
 ] 

gaojinchao commented on HBASE-4739:
---

This patch is not compatible, I added M_ZK_REGION_CLOSING and delete 
RS_ZK_REGION_CLOSING in EventHandler.java.

I have another question, Can I delete below code block in function 
unassign(HRegionInfo region, boolean force) ?

 } catch (NotServingRegionException nsre) {
  LOG.info(Server  + server +  returned  + nsre +  for  +
region.getEncodedName());
  // Presume that master has stale data.  Presume remote side just split.
  // Presume that the split message when it comes in will fix up the 
master's
  // in memory cluster state.
}catch (Throwable t)

I think we should use the wrap of RemoteException. 

 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, 
 HBASE-4739_trial.patch, HBASE-4739_trial6.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4825) TestRegionServersMetrics and TestZKLeaderManager are not categorized (small/medium/large)

2011-11-22 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155067#comment-13155067
 ] 

nkeywal commented on HBASE-4825:


Can be committed. There is zookeeper.TestZooKeeperACL as well.

 TestRegionServersMetrics and TestZKLeaderManager are not categorized 
 (small/medium/large)
 -

 Key: HBASE-4825
 URL: https://issues.apache.org/jira/browse/HBASE-4825
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 4825_trunk_java.patch


 see title

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4825) TestRegionServersMetrics and TestZKLeaderManager are not categorized (small/medium/large)

2011-11-22 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4825:
---

Attachment: 4825_trunk_java.patch

categories added only

 TestRegionServersMetrics and TestZKLeaderManager are not categorized 
 (small/medium/large)
 -

 Key: HBASE-4825
 URL: https://issues.apache.org/jira/browse/HBASE-4825
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 4825_trunk_java.patch


 see title

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4847) Activate single jvm for small tests on jenkins

2011-11-22 Thread nkeywal (Created) (JIRA)
Activate single jvm for small tests on jenkins
--

 Key: HBASE-4847
 URL: https://issues.apache.org/jira/browse/HBASE-4847
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.94.0
 Environment: build
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor


This will not revolutionate performances alone. We will win between 1 to 4 
minutes.

But we win as well:
 - it's a step for parallelizing the tests
 - new tests are less expensive as they do not create a new jvm: it's a 
continuous win
 - it will allow to push it on dev env while having the same env on dev  on 
build, and 3 minutes are 10% of small + medium tests execution time.

I will do a few submit patch to see if it works well before asking for the 
real commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-22 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4308:
--

Status: Patch Available  (was: Open)

 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-22 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4308:
--

Attachment: HBASE-4308.patch

Test cases are passing.

 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins

2011-11-22 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4847:
---

Attachment: 4847_pom.patch

 Activate single jvm for small tests on jenkins
 --

 Key: HBASE-4847
 URL: https://issues.apache.org/jira/browse/HBASE-4847
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.94.0
 Environment: build
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4847_pom.patch


 This will not revolutionate performances alone. We will win between 1 to 4 
 minutes.
 But we win as well:
  - it's a step for parallelizing the tests
  - new tests are less expensive as they do not create a new jvm: it's a 
 continuous win
  - it will allow to push it on dev env while having the same env on dev  on 
 build, and 3 minutes are 10% of small + medium tests execution time.
 I will do a few submit patch to see if it works well before asking for the 
 real commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins

2011-11-22 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4847:
---

Status: Patch Available  (was: Open)

 Activate single jvm for small tests on jenkins
 --

 Key: HBASE-4847
 URL: https://issues.apache.org/jira/browse/HBASE-4847
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.94.0
 Environment: build
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4847_pom.patch


 This will not revolutionate performances alone. We will win between 1 to 4 
 minutes.
 But we win as well:
  - it's a step for parallelizing the tests
  - new tests are less expensive as they do not create a new jvm: it's a 
 continuous win
  - it will allow to push it on dev env while having the same env on dev  on 
 build, and 3 minutes are 10% of small + medium tests execution time.
 I will do a few submit patch to see if it works well before asking for the 
 real commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-22 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155128#comment-13155128
 ] 

Hudson commented on HBASE-4842:
---

Integrated in HBase-0.92 #159 (See 
[https://builds.apache.org/job/HBase-0.92/159/])
HBASE-4842 [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java


 [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
 ---

 Key: HBASE-4842
 URL: https://issues.apache.org/jira/browse/HBASE-4842
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4, 0.92.0, 0.94.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 4842-v3.txt, hbase-4842-breaker.patch, hbase-4842.patch


 Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
 is intermittently failing.
 In the test, a region's assignment is purposely changed in META but not in 
 ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
 clean comes up with a new ZK assignment but with META still being 
 inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
 sometimes it moves to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4847) Activate single jvm for small tests on jenkins

2011-11-22 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155129#comment-13155129
 ] 

Hadoop QA commented on HBASE-4847:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504745/4847_pom.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause mvn compile goal to fail.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/333//testReport/
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/333//console

This message is automatically generated.

 Activate single jvm for small tests on jenkins
 --

 Key: HBASE-4847
 URL: https://issues.apache.org/jira/browse/HBASE-4847
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.94.0
 Environment: build
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4847_pom.patch


 This will not revolutionate performances alone. We will win between 1 to 4 
 minutes.
 But we win as well:
  - it's a step for parallelizing the tests
  - new tests are less expensive as they do not create a new jvm: it's a 
 continuous win
  - it will allow to push it on dev env while having the same env on dev  on 
 build, and 3 minutes are 10% of small + medium tests execution time.
 I will do a few submit patch to see if it works well before asking for the 
 real commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-22 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155166#comment-13155166
 ] 

Hadoop QA commented on HBASE-4739:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12504734/HBASE-4739_trial6.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 66 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestInstantSchemaChange
  org.apache.hadoop.hbase.client.TestAdmin

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/331//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/331//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/331//console

This message is automatically generated.

 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, 
 HBASE-4739_trial.patch, HBASE-4739_trial6.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4841) If I call split fast enough, while inserting, rows disappear.

2011-11-22 Thread ramkrishna.s.vasudevan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-4841:
-

Assignee: ramkrishna.s.vasudevan

 If I call split fast enough, while inserting, rows disappear. 
 --

 Key: HBASE-4841
 URL: https://issues.apache.org/jira/browse/HBASE-4841
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Attachments: 1, log, log2


 I'll attach a unit test for this. Basically if you call split, while 
 inserting data you can get to the point to where the cluster becomes 
 unstable, or rows will  disappear. The unit test gives you some flexibility 
 of:
 - How many rows
 - How wide the rows are
 - The frequency of the split. 
 The default settings crash unit tests or cause the unit tests to fail on my 
 laptop. On my macbook air, i could actually turn down the number of total 
 rows, and the frequency of the splits which is surprising. I think this is 
 because the macbook air has much better IO than my backup acer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins

2011-11-22 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4847:
---

Status: Patch Available  (was: Open)

with the right version...

 Activate single jvm for small tests on jenkins
 --

 Key: HBASE-4847
 URL: https://issues.apache.org/jira/browse/HBASE-4847
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.94.0
 Environment: build
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4847_pom.patch, 4847_pom.v2.patch


 This will not revolutionate performances alone. We will win between 1 to 4 
 minutes.
 But we win as well:
  - it's a step for parallelizing the tests
  - new tests are less expensive as they do not create a new jvm: it's a 
 continuous win
  - it will allow to push it on dev env while having the same env on dev  on 
 build, and 3 minutes are 10% of small + medium tests execution time.
 I will do a few submit patch to see if it works well before asking for the 
 real commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins

2011-11-22 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4847:
---

Attachment: 4847_pom.v2.patch

 Activate single jvm for small tests on jenkins
 --

 Key: HBASE-4847
 URL: https://issues.apache.org/jira/browse/HBASE-4847
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.94.0
 Environment: build
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4847_pom.patch, 4847_pom.v2.patch


 This will not revolutionate performances alone. We will win between 1 to 4 
 minutes.
 But we win as well:
  - it's a step for parallelizing the tests
  - new tests are less expensive as they do not create a new jvm: it's a 
 continuous win
  - it will allow to push it on dev env while having the same env on dev  on 
 build, and 3 minutes are 10% of small + medium tests execution time.
 I will do a few submit patch to see if it works well before asking for the 
 real commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins

2011-11-22 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4847:
---

Status: Open  (was: Patch Available)

 Activate single jvm for small tests on jenkins
 --

 Key: HBASE-4847
 URL: https://issues.apache.org/jira/browse/HBASE-4847
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.94.0
 Environment: build
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4847_pom.patch, 4847_pom.v2.patch


 This will not revolutionate performances alone. We will win between 1 to 4 
 minutes.
 But we win as well:
  - it's a step for parallelizing the tests
  - new tests are less expensive as they do not create a new jvm: it's a 
 continuous win
  - it will allow to push it on dev env while having the same env on dev  on 
 build, and 3 minutes are 10% of small + medium tests execution time.
 I will do a few submit patch to see if it works well before asking for the 
 real commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-22 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155198#comment-13155198
 ] 

Hadoop QA commented on HBASE-4308:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504738/HBASE-4308.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 66 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.replication.TestReplication
  org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.client.TestInstantSchemaChange

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/332//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/332//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/332//console

This message is automatically generated.

 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4847) Activate single jvm for small tests on jenkins

2011-11-22 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155232#comment-13155232
 ] 

Hadoop QA commented on HBASE-4847:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504754/4847_pom.v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 66 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.util.TestFSTableDescriptors

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/334//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/334//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/334//console

This message is automatically generated.

 Activate single jvm for small tests on jenkins
 --

 Key: HBASE-4847
 URL: https://issues.apache.org/jira/browse/HBASE-4847
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.94.0
 Environment: build
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4847_pom.patch, 4847_pom.v2.patch


 This will not revolutionate performances alone. We will win between 1 to 4 
 minutes.
 But we win as well:
  - it's a step for parallelizing the tests
  - new tests are less expensive as they do not create a new jvm: it's a 
 continuous win
  - it will allow to push it on dev env while having the same env on dev  on 
 build, and 3 minutes are 10% of small + medium tests execution time.
 I will do a few submit patch to see if it works well before asking for the 
 real commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins

2011-11-22 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4847:
---

Status: Open  (was: Patch Available)

 Activate single jvm for small tests on jenkins
 --

 Key: HBASE-4847
 URL: https://issues.apache.org/jira/browse/HBASE-4847
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.94.0
 Environment: build
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4847_pom.patch, 4847_pom.v2.patch


 This will not revolutionate performances alone. We will win between 1 to 4 
 minutes.
 But we win as well:
  - it's a step for parallelizing the tests
  - new tests are less expensive as they do not create a new jvm: it's a 
 continuous win
  - it will allow to push it on dev env while having the same env on dev  on 
 build, and 3 minutes are 10% of small + medium tests execution time.
 I will do a few submit patch to see if it works well before asking for the 
 real commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4847) Activate single jvm for small tests on jenkins

2011-11-22 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4847:
---

Attachment: 4847_pom.v2.patch

 Activate single jvm for small tests on jenkins
 --

 Key: HBASE-4847
 URL: https://issues.apache.org/jira/browse/HBASE-4847
 Project: HBase
  Issue Type: Improvement
  Components: build, test
Affects Versions: 0.94.0
 Environment: build
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4847_pom.patch, 4847_pom.v2.patch, 4847_pom.v2.patch


 This will not revolutionate performances alone. We will win between 1 to 4 
 minutes.
 But we win as well:
  - it's a step for parallelizing the tests
  - new tests are less expensive as they do not create a new jvm: it's a 
 continuous win
  - it will allow to push it on dev env while having the same env on dev  on 
 build, and 3 minutes are 10% of small + medium tests execution time.
 I will do a few submit patch to see if it works well before asking for the 
 real commit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4848) TestScanner failing because hostname can't be null

2011-11-22 Thread stack (Created) (JIRA)
TestScanner failing because hostname can't be null
--

 Key: HBASE-4848
 URL: https://issues.apache.org/jira/browse/HBASE-4848
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.90.5




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4848) TestScanner failing because hostname can't be null

2011-11-22 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155250#comment-13155250
 ] 

stack commented on HBASE-4848:
--

This happened last night for builds #346 and #347 and its happening locally for 
me. https://builds.apache.org/view/G-L/view/HBase/job/hbase-0.90/346/

 TestScanner failing because hostname can't be null
 --

 Key: HBASE-4848
 URL: https://issues.apache.org/jira/browse/HBASE-4848
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.90.5

 Attachments: 4848.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4848) TestScanner failing because hostname can't be null

2011-11-22 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4848:
-

Attachment: 4848.txt

Small fix.

 TestScanner failing because hostname can't be null
 --

 Key: HBASE-4848
 URL: https://issues.apache.org/jira/browse/HBASE-4848
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.90.5

 Attachments: 4848.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4848) TestScanner failing because hostname can't be null

2011-11-22 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4848:
-

 Assignee: stack
Affects Version/s: 0.90.5
   Status: Patch Available  (was: Open)

Committed to trunk and branches.

 TestScanner failing because hostname can't be null
 --

 Key: HBASE-4848
 URL: https://issues.apache.org/jira/browse/HBASE-4848
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: stack
Assignee: stack
 Fix For: 0.90.5

 Attachments: 4848-092.txt, 4848.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4848) TestScanner failing because hostname can't be null

2011-11-22 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155266#comment-13155266
 ] 

Hadoop QA commented on HBASE-4848:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504770/4848-092.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/335//console

This message is automatically generated.

 TestScanner failing because hostname can't be null
 --

 Key: HBASE-4848
 URL: https://issues.apache.org/jira/browse/HBASE-4848
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: stack
Assignee: stack
 Fix For: 0.90.5

 Attachments: 4848-092.txt, 4848.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-4825) TestRegionServersMetrics and TestZKLeaderManager are not categorized (small/medium/large)

2011-11-22 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-4825.
--

   Resolution: Fixed
Fix Version/s: 0.94.0
 Hadoop Flags: Reviewed

Committed to trunk.  Thanks for patch N.

I committed a changed to TestCatalogTracker at same time mistakenly and then 
backed it ou.

 TestRegionServersMetrics and TestZKLeaderManager are not categorized 
 (small/medium/large)
 -

 Key: HBASE-4825
 URL: https://issues.apache.org/jira/browse/HBASE-4825
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
 Fix For: 0.94.0

 Attachments: 4825_trunk_java.patch


 see title

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4849) TestCatalogTracker can fail if an existing zookeeper running

2011-11-22 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4849:
-

Attachment: 4849.txt

Minor fix; pass HTU.getConfiguration()

 TestCatalogTracker can fail if an existing zookeeper running
 

 Key: HBASE-4849
 URL: https://issues.apache.org/jira/browse/HBASE-4849
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.92.0

 Attachments: 4849.txt


 This fact sunk my attempt at building an RC.  Fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4849) TestCatalogTracker can fail if an existing zookeeper running

2011-11-22 Thread stack (Created) (JIRA)
TestCatalogTracker can fail if an existing zookeeper running


 Key: HBASE-4849
 URL: https://issues.apache.org/jira/browse/HBASE-4849
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.92.0
 Attachments: 4849.txt

This fact sunk my attempt at building an RC.  Fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-4849) TestCatalogTracker can fail if an existing zookeeper running

2011-11-22 Thread stack (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-4849.
--

Resolution: Fixed
  Assignee: stack

Committed branch and trunk.

 TestCatalogTracker can fail if an existing zookeeper running
 

 Key: HBASE-4849
 URL: https://issues.apache.org/jira/browse/HBASE-4849
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: 4849.txt


 This fact sunk my attempt at building an RC.  Fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has

2011-11-22 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-4797:
---

Attachment: 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch

This patch is done with --no-prefix.

Please check it in.

Thanks a lot!

 [availability] Skip recovered.edits files with edits we know older than what 
 region currently has
 -

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob
 Fix For: 0.94.0

 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch


 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has

2011-11-22 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4797:
-

Status: Open  (was: Patch Available)

 [availability] Skip recovered.edits files with edits we know older than what 
 region currently has
 -

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob
 Fix For: 0.94.0

 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch


 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has

2011-11-22 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4797:
-

Status: Patch Available  (was: Open)

 [availability] Skip recovered.edits files with edits we know older than what 
 region currently has
 -

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob
 Fix For: 0.94.0

 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch


 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-22 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155289#comment-13155289
 ] 

stack commented on HBASE-4842:
--

Committed to TRUNK for completeness sake.

I've messed up your issue Jon.  Please forgive me.  Will we resolve this one 
and open a new one for the root issue?

 [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
 ---

 Key: HBASE-4842
 URL: https://issues.apache.org/jira/browse/HBASE-4842
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4, 0.92.0, 0.94.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 4842-v3.txt, hbase-4842-breaker.patch, hbase-4842.patch


 Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
 is intermittently failing.
 In the test, a region's assignment is purposely changed in META but not in 
 ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
 clean comes up with a new ZK assignment but with META still being 
 inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
 sometimes it moves to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has

2011-11-22 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155292#comment-13155292
 ] 

stack commented on HBASE-4797:
--

@Jimmy Just FYI, since you are new, to trigger the build again, you need to 
re-upload the original patch or a new one (which you did), then (I think) you 
need to cancel and resubmit the patch.

 [availability] Skip recovered.edits files with edits we know older than what 
 region currently has
 -

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob
 Fix For: 0.94.0

 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch


 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has

2011-11-22 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-4797:
---

Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

 [availability] Skip recovered.edits files with edits we know older than what 
 region currently has
 -

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob
 Fix For: 0.94.0

 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch


 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has

2011-11-22 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-4797:
---

Status: Open  (was: Patch Available)

 [availability] Skip recovered.edits files with edits we know older than what 
 region currently has
 -

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob
 Fix For: 0.94.0

 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch


 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has

2011-11-22 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155298#comment-13155298
 ] 

Jimmy Xiang commented on HBASE-4797:


Thanks!  I cancel and resubmit the patch.

 [availability] Skip recovered.edits files with edits we know older than what 
 region currently has
 -

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob
 Fix For: 0.94.0

 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch


 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has

2011-11-22 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-4797:
---

Attachment: 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch

 [availability] Skip recovered.edits files with edits we know older than what 
 region currently has
 -

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob
 Fix For: 0.94.0

 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch


 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has

2011-11-22 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-4797:
---

Status: Patch Available  (was: Open)

 [availability] Skip recovered.edits files with edits we know older than what 
 region currently has
 -

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob
 Fix For: 0.94.0

 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch


 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-22 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155299#comment-13155299
 ] 

Ted Yu commented on HBASE-4308:
---

Patch makes sense.
Minor comment:
{code}
+boolean deleteOpenedNode = false;
{code}
I think openedNodeDeleted would be a better name.

 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has

2011-11-22 Thread Jimmy Xiang (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-4797:
---

Status: Open  (was: Patch Available)

 [availability] Skip recovered.edits files with edits we know older than what 
 region currently has
 -

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob
 Fix For: 0.94.0

 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch


 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-22 Thread Ted Yu (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155050#comment-13155050
 ] 

Ted Yu edited comment on HBASE-4739 at 11/22/11 6:05 PM:
-

This patch is not compatible, I added M_ZK_REGION_CLOSING and delete 
RS_ZK_REGION_CLOSING in EventHandler.java.

I have another question, Can I delete below code block in function 
unassign(HRegionInfo region, boolean force) ?
{code}
 } catch (NotServingRegionException nsre) {
  LOG.info(Server  + server +  returned  + nsre +  for  +
region.getEncodedName());
  // Presume that master has stale data.  Presume remote side just split.
  // Presume that the split message when it comes in will fix up the 
master's
  // in memory cluster state.
}catch (Throwable t)
{code}
I think we should use the wrap of RemoteException. 

  was (Author: sunnygao):
This patch is not compatible, I added M_ZK_REGION_CLOSING and delete 
RS_ZK_REGION_CLOSING in EventHandler.java.

I have another question, Can I delete below code block in function 
unassign(HRegionInfo region, boolean force) ?

 } catch (NotServingRegionException nsre) {
  LOG.info(Server  + server +  returned  + nsre +  for  +
region.getEncodedName());
  // Presume that master has stale data.  Presume remote side just split.
  // Presume that the split message when it comes in will fix up the 
master's
  // in memory cluster state.
}catch (Throwable t)

I think we should use the wrap of RemoteException. 
  
 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, 
 HBASE-4739_trial.patch, HBASE-4739_trial6.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-22 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155309#comment-13155309
 ] 

Ted Yu commented on HBASE-4739:
---

All the test failures were caused by 'Too many open files'.

 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, 
 HBASE-4739_trial.patch, HBASE-4739_trial6.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-22 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4308:
--

Status: Patch Available  (was: Open)

 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch, HBASE-4308_1.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-22 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4308:
--

Attachment: HBASE-4308_1.patch

Updated patch addressing Ted's comments.

 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch, HBASE-4308_1.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-22 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4308:
--

Status: Open  (was: Patch Available)

 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch, HBASE-4308_1.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4773) HBaseAdmin leaks ZooKeeper connections

2011-11-22 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155319#comment-13155319
 ] 

ramkrishna.s.vasudevan commented on HBASE-4773:
---

@Xufeng
Have you tested the patch in real cluster ?


 HBaseAdmin leaks ZooKeeper connections
 --

 Key: HBASE-4773
 URL: https://issues.apache.org/jira/browse/HBASE-4773
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: gaojinchao
Priority: Critical
 Fix For: 0.90.5

 Attachments: 4773.patch


 When master crashs, HBaseAdmin will leaks ZooKeeper connections
 I think we should close the zk connetion when throw MasterNotRunningException
  public HBaseAdmin(Configuration c)
   throws MasterNotRunningException, ZooKeeperConnectionException {
 this.conf = HBaseConfiguration.create(c);
 this.connection = HConnectionManager.getConnection(this.conf);
 this.pause = this.conf.getLong(hbase.client.pause, 1000);
 this.numRetries = this.conf.getInt(hbase.client.retries.number, 10);
 this.retryLongerMultiplier = 
 this.conf.getInt(hbase.client.retries.longer.multiplier, 10);
 //we should add this code and close the zk connection
 try{
   this.connection.getMaster();
 }catch(MasterNotRunningException e){
   HConnectionManager.deleteConnection(conf, false);
   throw e;  
 }
   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-22 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155320#comment-13155320
 ] 

Ted Yu commented on HBASE-4739:
---

Nice work, Jinchao.

{code}
//RS_ZK_REGION_CLOSING  (1),   // Master adds this region as closing in 
ZK
{code}
We should say that this event is replaced by M_ZK_REGION_CLOSING.
{code}
// RS is already processing this region, only need update the timestamp
if(t instanceof RegionAlreadyInTransitionException){  
{code}
I think we should place else in front of if above. The comment should read 
'need to update'
Also leave space between if and (, between ) and {.

Please also prepare patch for 0.90.5

 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, 
 HBASE-4739_trial.patch, HBASE-4739_trial6.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-22 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155328#comment-13155328
 ] 

ramkrishna.s.vasudevan commented on HBASE-4739:
---

@Ted
In 0.90.5 the CLOSING node is created by RS.  Do we need to change that 
behaviour also? 
Are only forcefully calling unassign() will do ?

 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, 
 HBASE-4739_trial.patch, HBASE-4739_trial6.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-22 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155330#comment-13155330
 ] 

Ted Yu commented on HBASE-4739:
---

We will keep RS_ZK_REGION_CLOSING event for 0.90.x
invokeUnassign() should still be called.

 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, 
 HBASE-4739_trial.patch, HBASE-4739_trial6.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-22 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155346#comment-13155346
 ] 

Ted Yu commented on HBASE-4308:
---

+1 on patch v2.

 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch, HBASE-4308_1.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3952) Guava snuck back in as a dependency via hbase-3777

2011-11-22 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155360#comment-13155360
 ] 

Todd Lipcon commented on HBASE-3952:


We should put this in 0.90 before 0.90.5

 Guava snuck back in as a dependency via hbase-3777
 --

 Key: HBASE-3952
 URL: https://issues.apache.org/jira/browse/HBASE-3952
 Project: HBase
  Issue Type: Task
Reporter: stack
 Fix For: 0.92.0, 0.90.5

 Attachments: hcm.txt


 Undo it as we did in HBASE-3264.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3952) Guava snuck back in as a dependency via hbase-3777

2011-11-22 Thread Todd Lipcon (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HBASE-3952:
---

Fix Version/s: 0.90.5
   0.92.0

 Guava snuck back in as a dependency via hbase-3777
 --

 Key: HBASE-3952
 URL: https://issues.apache.org/jira/browse/HBASE-3952
 Project: HBase
  Issue Type: Task
Reporter: stack
 Fix For: 0.92.0, 0.90.5

 Attachments: hcm.txt


 Undo it as we did in HBASE-3264.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-3952) Guava snuck back in as a dependency via hbase-3777

2011-11-22 Thread Todd Lipcon (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HBASE-3952.


  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to 0.90 branch.

 Guava snuck back in as a dependency via hbase-3777
 --

 Key: HBASE-3952
 URL: https://issues.apache.org/jira/browse/HBASE-3952
 Project: HBase
  Issue Type: Task
Reporter: stack
 Fix For: 0.92.0, 0.90.5

 Attachments: hcm.txt


 Undo it as we did in HBASE-3264.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has

2011-11-22 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155369#comment-13155369
 ] 

Hadoop QA commented on HBASE-4797:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12504777/0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 66 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestInstantSchemaChange
  org.apache.hadoop.hbase.client.TestAdmin

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/336//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/336//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/336//console

This message is automatically generated.

 [availability] Skip recovered.edits files with edits we know older than what 
 region currently has
 -

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob
 Fix For: 0.94.0

 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch


 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-22 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155383#comment-13155383
 ] 

stack commented on HBASE-4308:
--

So, we are moving the call of regionOnline out of OpenRegionHandler and up as a 
reaction to the delete of znode in AM?  That looks like a good change.

What is odd though is that the log message -- Node deleted but still in RIT:  
-- gives the impression that there is something wrong when this log message 
comes out though this is now the legit way of onlining a region in master.  I'd 
suggest that we change the log message to 'Node deleted ...'.

Should this test which is in makeRegionOnline be up in the caller (You test 
SPLIT and SPLITTING in caller... it would make code easier to read):

{code}
if (rs.getState().equals(RegionState.State.OPEN))
{code}

Why don't we do rs.isOpened() instead of the above check?

Call the method makeRegionOnline instead regionOnline?

This log message seems extraneous given the above logging of delete:

{code}
+debugLog(regionInfo, The znode of region 
++ regionInfo.getRegionNameAsString() +  has been deleted.);
{code}

Otherwise patch looks good.

 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch, HBASE-4308_1.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4797) [availability] Skip recovered.edits files with edits we know older than what region currently has

2011-11-22 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155391#comment-13155391
 ] 

Hadoop QA commented on HBASE-4797:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12504779/0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 66 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.master.TestMasterFailover
  org.apache.hadoop.hbase.client.TestAdmin

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/337//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/337//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/337//console

This message is automatically generated.

 [availability] Skip recovered.edits files with edits we know older than what 
 region currently has
 -

 Key: HBASE-4797
 URL: https://issues.apache.org/jira/browse/HBASE-4797
 Project: HBase
  Issue Type: Bug
  Components: performance
Reporter: stack
Assignee: Jimmy Xiang
Priority: Critical
  Labels: noob
 Fix For: 0.94.0

 Attachments: 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-[availability]-skip-older-edits.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch, 
 0001-HBASE-4797-availability-skip-files-with-edits-we-kno.patch


 Testing 0.92, I crashed all servers out.  Another bug makes it so WALs are 
 not getting cleaned so I had 7000 regions to replay.  The distributed split 
 code did a nice job and cluster came back but interesting is that some hot 
 regions ended up having loads of recovered.edits files -- tens if not 
 hundreds -- to replay against the region (can we bulk load recovered.edits 
 instead of replaying them?).  Each recovered.edits file is taking about a 
 second to process (though only about 30 odd edits per file it seems).  The 
 region is unavailable during this time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92

2011-11-22 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155394#comment-13155394
 ] 

Lars Hofhansl commented on HBASE-4838:
--

Still working on this. I have narrowed down the scenario (also happens with 
just 2 cells that are split), but don't have the root cause, yet.

I looked through the patch multiple times and I pretty sure the changes are 
correct (in the sense that they represent the patch), there must be an 
assumption about some behavior in trunk that is not present in 0.92.

 Port 2856 (TestAcidGuarantee is failing) to 0.92
 

 Key: HBASE-4838
 URL: https://issues.apache.org/jira/browse/HBASE-4838
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 4838-v1.txt


 Moving back port into a separate issue (as suggested by JonH), because this 
 not trivial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-22 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155408#comment-13155408
 ] 

Hadoop QA commented on HBASE-4308:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504781/HBASE-4308_1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 66 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.replication.TestReplication
  org.apache.hadoop.hbase.client.TestInstantSchemaChange

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/338//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/338//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/338//console

This message is automatically generated.

 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch, HBASE-4308_1.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-11-22 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-4218:
---

Attachment: D447.4.patch

mbautin updated the revision [jira] [HBASE-4218] Delta encoding for keys in 
HFile.
Reviewers: JIRA, tedyu, stack, nspiegelberg, Kannan

  Rebased on most recent changes in trunk, fixed conflicts. There are failing 
unit tests, and delta compression is not yet aware of the persistent memstore 
TS field added in 2856.

REVISION DETAIL
  https://reviews.facebook.net/D447

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
  src/main/java/org/apache/hadoop/hbase/KeyValue.java
  src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java
  
src/main/java/org/apache/hadoop/hbase/io/deltaencoder/BitsetKeyDeltaEncoder.java
  
src/main/java/org/apache/hadoop/hbase/io/deltaencoder/BufferedDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/deltaencoder/CompressionState.java
  src/main/java/org/apache/hadoop/hbase/io/deltaencoder/CopyKeyDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncodedBlock.java
  src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncoder.java
  
src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncoderAlgorithms.java
  
src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncoderToSmallBufferException.java
  src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DiffKeyDeltaEncoder.java
  
src/main/java/org/apache/hadoop/hbase/io/deltaencoder/FastDiffDeltaEncoder.java
  
src/main/java/org/apache/hadoop/hbase/io/deltaencoder/PrefixKeyDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/EmptyBlockDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
  src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaConfigured.java
  src/main/ruby/hbase/admin.rb
  src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
  src/test/java/org/apache/hadoop/hbase/HFilePerformanceEvaluation.java
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
  src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java
  
src/test/java/org/apache/hadoop/hbase/io/deltaencoder/RedundantKVGenerator.java
  
src/test/java/org/apache/hadoop/hbase/io/deltaencoder/TestBufferedDeltaEncoder.java
  src/test/java/org/apache/hadoop/hbase/io/deltaencoder/TestDeltaEncoders.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlock.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockDeltaEncoder.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/DeltaEncodingSeekPerformance.java
  src/test/java/org/apache/hadoop/hbase/regionserver/DeltaEncodingUtil.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java
  src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java


 Delta Encoding of KeyValues  (aka prefix compression)
 -

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
  Labels: compression
 Attachments: D447.1.patch, D447.2.patch, D447.3.patch, D447.4.patch, 
 open-source.diff


 A compression for keys. Keys are 

[jira] [Commented] (HBASE-4846) HBase can't find the native lib

2011-11-22 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155420#comment-13155420
 ] 

stack commented on HBASE-4846:
--

Can you make a patch Aaron and give demo your fix works rather than what is 
currently there?  Thanks.

 HBase can't find the native lib
 ---

 Key: HBASE-4846
 URL: https://issues.apache.org/jira/browse/HBASE-4846
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 0.90.4
Reporter: Aaron Guo

 the shell script:${HBASE_HOME}/bin/hbase load the hadoop native lib like this:
 if [ -d /usr/lib/hadoop-0.20/lib/native/${JAVA_PLATFORM} ] ; then
   JAVA_LIBRARY_PATH=$(append_path ${JAVA_LIBRARY_PATH} 
 /usr/lib/hadoop-0.20/lib/native/${JAVA_PLATFORM})
 fi
 It should work like this:
 if [ -d ${HADOOP_HOME}/lib/native/${JAVA_PLATFORM} ] ; then
   JAVA_LIBRARY_PATH=$(append_path ${JAVA_LIBRARY_PATH} 
 ${HADOOP_HOME}/lib/native/${JAVA_PLATFORM})
 fi

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

2011-11-22 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155422#comment-13155422
 ] 

stack commented on HBASE-4832:
--

I can address the N comment on commit? (Removing the @Test timeout).

 TestRegionServerCoprocessorExceptionWithAbort fails if the region server 
 stops too fast
 ---

 Key: HBASE-4832
 URL: https://issues.apache.org/jira/browse/HBASE-4832
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: Eugene Koontz
Priority: Minor
 Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, 
 HBASE-4832.patch, HBASE-4832.patch


 The current implementation of HRegionServer#stop is
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 synchronized (this) {
   // Wakes run() if it is sleeping
   notifyAll(); // FindBugs NN_NAKED_NOTIFY
 }
   }
 {noformat}
 The notification is sent on the wrong object and does nothing. As a 
 consequence, the region server continues to sleep instead of waking up and 
 stopping immediately. A correct implementation is:
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 // Wakes run() if it is sleeping
 sleeper.skipSleepCycle();
   }
 {noformat}
 Then the region server stops immediately. This makes the region server stops 
 0,5s faster on average, which is quite useful for unit tests.
 However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
 not work.
 It likely because the code does no expect the region server to stop that fast.
 The exception is:
 {noformat}
 testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
   Time elapsed: 30.06 sec   ERROR!
 java.lang.Exception: test timed out after 3 milliseconds
   at java.lang.Throwable.fillInStackTrace(Native Method)
   at java.lang.Throwable.init(Throwable.java:196)
   at java.lang.Exception.init(Exception.java:41)
   at java.lang.InterruptedException.init(InterruptedException.java:48)
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
   at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725)
   at 
 

[jira] [Commented] (HBASE-4811) Support reverse Scan

2011-11-22 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155426#comment-13155426
 ] 

stack commented on HBASE-4811:
--

bq. Is there a fundamental reason that HBase only supports forward Scan?

Yes.  All the data is sorted in one direction only and all Scan objects are 
written to go in the data's 'natural' direction.  There is no native support 
for going backwards whether its reading from files 'backwards' or getting a 
view on our MemStore that gives a reverse-sort-view.

To make it work, you'd have to write a bunch of code and you'd be always going 
against the grain.

It used to come up the odd time in the early days but versions on the above 
args would usually quiet them.

If you need more detail, ask.

 Support reverse Scan
 

 Key: HBASE-4811
 URL: https://issues.apache.org/jira/browse/HBASE-4811
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.6
Reporter: John Carrino

 All the documentation I find about HBase says that if you want forward and 
 reverse scans you should just build 2 tables and one be ascending and one 
 descending.  Is there a fundamental reason that HBase only supports forward 
 Scan?  It seems like a lot of extra space overhead and coding overhead (to 
 keep them in sync) to support 2 tables.  
 I am assuming this has been discussed before, but I can't find the 
 discussions anywhere about it or why it would be infeasible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

2011-11-22 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155453#comment-13155453
 ] 

stack commented on HBASE-4832:
--

And N, you want to uncomment this section now?  This patch wants to do it.

{code}
  public void stop(final String msg) {
this.stopped = true;
LOG.info(STOPPED:  + msg);
// Wakes run() if it is sleeping
//sleeper.skipSleepCycle();
//will be uncommented later, see discussion in jira 4798
  }
{code}

 TestRegionServerCoprocessorExceptionWithAbort fails if the region server 
 stops too fast
 ---

 Key: HBASE-4832
 URL: https://issues.apache.org/jira/browse/HBASE-4832
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: Eugene Koontz
Priority: Minor
 Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, 
 HBASE-4832.patch, HBASE-4832.patch


 The current implementation of HRegionServer#stop is
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 synchronized (this) {
   // Wakes run() if it is sleeping
   notifyAll(); // FindBugs NN_NAKED_NOTIFY
 }
   }
 {noformat}
 The notification is sent on the wrong object and does nothing. As a 
 consequence, the region server continues to sleep instead of waking up and 
 stopping immediately. A correct implementation is:
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 // Wakes run() if it is sleeping
 sleeper.skipSleepCycle();
   }
 {noformat}
 Then the region server stops immediately. This makes the region server stops 
 0,5s faster on average, which is quite useful for unit tests.
 However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
 not work.
 It likely because the code does no expect the region server to stop that fast.
 The exception is:
 {noformat}
 testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
   Time elapsed: 30.06 sec   ERROR!
 java.lang.Exception: test timed out after 3 milliseconds
   at java.lang.Throwable.fillInStackTrace(Native Method)
   at java.lang.Throwable.init(Throwable.java:196)
   at java.lang.Exception.init(Exception.java:41)
   at java.lang.InterruptedException.init(InterruptedException.java:48)
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
   at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892)
   at 

[jira] [Commented] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

2011-11-22 Thread Eugene Koontz (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155466#comment-13155466
 ] 

Eugene Koontz commented on HBASE-4832:
--

@stack, that is fine, thanks.
-Eugene

 TestRegionServerCoprocessorExceptionWithAbort fails if the region server 
 stops too fast
 ---

 Key: HBASE-4832
 URL: https://issues.apache.org/jira/browse/HBASE-4832
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: Eugene Koontz
Priority: Minor
 Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, 
 HBASE-4832.patch, HBASE-4832.patch


 The current implementation of HRegionServer#stop is
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 synchronized (this) {
   // Wakes run() if it is sleeping
   notifyAll(); // FindBugs NN_NAKED_NOTIFY
 }
   }
 {noformat}
 The notification is sent on the wrong object and does nothing. As a 
 consequence, the region server continues to sleep instead of waking up and 
 stopping immediately. A correct implementation is:
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 // Wakes run() if it is sleeping
 sleeper.skipSleepCycle();
   }
 {noformat}
 Then the region server stops immediately. This makes the region server stops 
 0,5s faster on average, which is quite useful for unit tests.
 However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
 not work.
 It likely because the code does no expect the region server to stop that fast.
 The exception is:
 {noformat}
 testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
   Time elapsed: 30.06 sec   ERROR!
 java.lang.Exception: test timed out after 3 milliseconds
   at java.lang.Throwable.fillInStackTrace(Native Method)
   at java.lang.Throwable.init(Throwable.java:196)
   at java.lang.Exception.init(Exception.java:41)
   at java.lang.InterruptedException.init(InterruptedException.java:48)
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
   at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725)
   at 
 

[jira] [Created] (HBASE-4850) hbase tests need to be made Hadoop version agnostic

2011-11-22 Thread Roman Shaposhnik (Created) (JIRA)
hbase tests need to be made Hadoop version agnostic
---

 Key: HBASE-4850
 URL: https://issues.apache.org/jira/browse/HBASE-4850
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.92.0, 0.94.0, 0.92.1
Reporter: Roman Shaposhnik


Currently it is possible to have a single hbase jar that can work with multiple 
versions of Hadoop. It would be nice if hbase-test.jar also followed the suit. 
For now I'm aware of the following problems (but there could be more):

1. org.apache.hadoop.hbase.mapreduce.NMapInputFormat is failing because
org.apache.hadoop.mapreduce.JobContext is either class or interface
depending on which version of Hadoop you compile it against.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

2011-11-22 Thread Eugene Koontz (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155468#comment-13155468
 ] 

Eugene Koontz commented on HBASE-4832:
--

@stack, I tried with sleeper.skipSleepCycle() uncommented and commented; test 
consistently succeeded 30+ iterations in both cases.

 TestRegionServerCoprocessorExceptionWithAbort fails if the region server 
 stops too fast
 ---

 Key: HBASE-4832
 URL: https://issues.apache.org/jira/browse/HBASE-4832
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: Eugene Koontz
Priority: Minor
 Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, 
 HBASE-4832.patch, HBASE-4832.patch


 The current implementation of HRegionServer#stop is
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 synchronized (this) {
   // Wakes run() if it is sleeping
   notifyAll(); // FindBugs NN_NAKED_NOTIFY
 }
   }
 {noformat}
 The notification is sent on the wrong object and does nothing. As a 
 consequence, the region server continues to sleep instead of waking up and 
 stopping immediately. A correct implementation is:
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 // Wakes run() if it is sleeping
 sleeper.skipSleepCycle();
   }
 {noformat}
 Then the region server stops immediately. This makes the region server stops 
 0,5s faster on average, which is quite useful for unit tests.
 However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
 not work.
 It likely because the code does no expect the region server to stop that fast.
 The exception is:
 {noformat}
 testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
   Time elapsed: 30.06 sec   ERROR!
 java.lang.Exception: test timed out after 3 milliseconds
   at java.lang.Throwable.fillInStackTrace(Native Method)
   at java.lang.Throwable.init(Throwable.java:196)
   at java.lang.Exception.init(Exception.java:41)
   at java.lang.InterruptedException.init(InterruptedException.java:48)
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
   at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725)
   at 
 

[jira] [Created] (HBASE-4851) hadoop maven dependency needs to be an optional one

2011-11-22 Thread Roman Shaposhnik (Created) (JIRA)
hadoop maven dependency needs to be an optional one
---

 Key: HBASE-4851
 URL: https://issues.apache.org/jira/browse/HBASE-4851
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 0.92.0, 0.94.0, 0.92.1
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik


Given that HBase 0.92/0.94 is likely to be used with at least 3 different 
versions of Hadoop (0.20, 0.22 and 0.23) it seems appropriate to make hadoop 
maven dependencies into optional ones (IOW, the build of HBase will see NO 
changes in behavior, but any component that has HBase as a dependency will be 
in control of what version of Hadoop gets used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

2011-11-22 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated HBASE-4832:
-

Attachment: HBASE-4832.patch

-Removes (timeout=3) from @Test per nkeywal's suggestion.
-Add LOG.debug() concerning where interrupt occurs.

 TestRegionServerCoprocessorExceptionWithAbort fails if the region server 
 stops too fast
 ---

 Key: HBASE-4832
 URL: https://issues.apache.org/jira/browse/HBASE-4832
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: Eugene Koontz
Priority: Minor
 Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, 
 HBASE-4832.patch, HBASE-4832.patch, HBASE-4832.patch


 The current implementation of HRegionServer#stop is
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 synchronized (this) {
   // Wakes run() if it is sleeping
   notifyAll(); // FindBugs NN_NAKED_NOTIFY
 }
   }
 {noformat}
 The notification is sent on the wrong object and does nothing. As a 
 consequence, the region server continues to sleep instead of waking up and 
 stopping immediately. A correct implementation is:
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 // Wakes run() if it is sleeping
 sleeper.skipSleepCycle();
   }
 {noformat}
 Then the region server stops immediately. This makes the region server stops 
 0,5s faster on average, which is quite useful for unit tests.
 However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
 not work.
 It likely because the code does no expect the region server to stop that fast.
 The exception is:
 {noformat}
 testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
   Time elapsed: 30.06 sec   ERROR!
 java.lang.Exception: test timed out after 3 milliseconds
   at java.lang.Throwable.fillInStackTrace(Native Method)
   at java.lang.Throwable.init(Throwable.java:196)
   at java.lang.Exception.init(Exception.java:41)
   at java.lang.InterruptedException.init(InterruptedException.java:48)
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
   at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725)
   at 
 

[jira] [Updated] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

2011-11-22 Thread Eugene Koontz (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated HBASE-4832:
-

Attachment: HBASE-4832.patch

git diff --no-prefix

 TestRegionServerCoprocessorExceptionWithAbort fails if the region server 
 stops too fast
 ---

 Key: HBASE-4832
 URL: https://issues.apache.org/jira/browse/HBASE-4832
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: Eugene Koontz
Priority: Minor
 Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, 
 HBASE-4832.patch, HBASE-4832.patch, HBASE-4832.patch, HBASE-4832.patch


 The current implementation of HRegionServer#stop is
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 synchronized (this) {
   // Wakes run() if it is sleeping
   notifyAll(); // FindBugs NN_NAKED_NOTIFY
 }
   }
 {noformat}
 The notification is sent on the wrong object and does nothing. As a 
 consequence, the region server continues to sleep instead of waking up and 
 stopping immediately. A correct implementation is:
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 // Wakes run() if it is sleeping
 sleeper.skipSleepCycle();
   }
 {noformat}
 Then the region server stops immediately. This makes the region server stops 
 0,5s faster on average, which is quite useful for unit tests.
 However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
 not work.
 It likely because the code does no expect the region server to stop that fast.
 The exception is:
 {noformat}
 testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
   Time elapsed: 30.06 sec   ERROR!
 java.lang.Exception: test timed out after 3 milliseconds
   at java.lang.Throwable.fillInStackTrace(Native Method)
   at java.lang.Throwable.init(Throwable.java:196)
   at java.lang.Exception.init(Exception.java:41)
   at java.lang.InterruptedException.init(InterruptedException.java:48)
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
   at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725)
   at 
 

[jira] [Updated] (HBASE-4850) hbase tests need to be made Hadoop version agnostic

2011-11-22 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4850:
-

Priority: Critical  (was: Major)

 hbase tests need to be made Hadoop version agnostic
 ---

 Key: HBASE-4850
 URL: https://issues.apache.org/jira/browse/HBASE-4850
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.92.0, 0.94.0, 0.92.1
Reporter: Roman Shaposhnik
Priority: Critical

 Currently it is possible to have a single hbase jar that can work with 
 multiple versions of Hadoop. It would be nice if hbase-test.jar also followed 
 the suit. For now I'm aware of the following problems (but there could be 
 more):
 1. org.apache.hadoop.hbase.mapreduce.NMapInputFormat is failing because
 org.apache.hadoop.mapreduce.JobContext is either class or interface
 depending on which version of Hadoop you compile it against.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3307) Add checkAndPut to the Thrift API

2011-11-22 Thread Bryce Allen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155503#comment-13155503
 ] 

Bryce Allen commented on HBASE-3307:


checkAndPut/checkAndDelete is one of the key features that differentiates HBase 
from other key values stores, it's a shame that these are only available to 
Java clients, since they are missing from both the Thrift and REST APIs.

 Add checkAndPut to the Thrift API
 -

 Key: HBASE-3307
 URL: https://issues.apache.org/jira/browse/HBASE-3307
 Project: HBase
  Issue Type: New Feature
  Components: thrift
Affects Versions: 0.89.20100924
Reporter: Chris Tarnas
Priority: Minor

 It would be very useful to have the checkAndPut method available via the 
 Thrift API. This would both allow for easier atomic updates as well as cut 
 down on at least one Thrift roundtrip for quite a few common tasks. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4811) Support reverse Scan

2011-11-22 Thread John Carrino (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155506#comment-13155506
 ] 

John Carrino commented on HBASE-4811:
-

Yeah, I'm not that familiar with the codebase, but I'd assume that in
order to get forward scans you'd have to have the data sorted. And
from what I understand it is internally stored as sstables or
HFiles. If you have it sorted to scan in one direction, it seems
pretty easy to go the other direction.  LevelDb uses ssTables and
supports reverse ranges.

The only thing that I could think of from the design (from a high
level) that might make it difficult to do reverse ranges is dealing
with splitting ranges when moving ranges from one region server to
another.

Just from a quick look at MemStore that you mention, it uses a
KeyValueSkipListSet under the covers that is a NavigableSet and
supports descendingSet and descendingIterator.

-jc


On Tue, Nov 22, 2011 at 12:52 PM, stack (Commented) (JIRA)


 Support reverse Scan
 

 Key: HBASE-4811
 URL: https://issues.apache.org/jira/browse/HBASE-4811
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.6
Reporter: John Carrino

 All the documentation I find about HBase says that if you want forward and 
 reverse scans you should just build 2 tables and one be ascending and one 
 descending.  Is there a fundamental reason that HBase only supports forward 
 Scan?  It seems like a lot of extra space overhead and coding overhead (to 
 keep them in sync) to support 2 tables.  
 I am assuming this has been discussed before, but I can't find the 
 discussions anywhere about it or why it would be infeasible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4811) Support reverse Scan

2011-11-22 Thread John Carrino (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155519#comment-13155519
 ] 

John Carrino commented on HBASE-4811:
-

Yeah, I'm not that familiar with the codebase, but I'd assume that in
order to get forward scans you'd have to have the data sorted. And
from what I understand it is internally stored as sstables or
HFiles. If you have it sorted to scan in one direction, it seems
pretty easy to go the other direction.  LevelDb uses ssTables and
supports reverse ranges.

The only thing that I could think of from the design (from a high
level) that might make it difficult to do reverse ranges is dealing
with splitting ranges when moving ranges from one region server to
another.

Just from a quick look at MemStore that you mention, it uses a
KeyValueSkipListSet under the covers that is a NavigableSet and
supports descendingSet and descendingIterator.

Also to provide some context, this table we want to scan both ways is 
effectively an index which will be relatively small and we would like to pin in 
memory (as much as possible).  Also likely that this will run on all Sold 
State, so doing reverse reads won't be a perf hit like it would be for spinny 
drives.



 Support reverse Scan
 

 Key: HBASE-4811
 URL: https://issues.apache.org/jira/browse/HBASE-4811
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.6
Reporter: John Carrino

 All the documentation I find about HBase says that if you want forward and 
 reverse scans you should just build 2 tables and one be ascending and one 
 descending.  Is there a fundamental reason that HBase only supports forward 
 Scan?  It seems like a lot of extra space overhead and coding overhead (to 
 keep them in sync) to support 2 tables.  
 I am assuming this has been discussed before, but I can't find the 
 discussions anywhere about it or why it would be infeasible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3307) Add checkAndPut to the Thrift API

2011-11-22 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155524#comment-13155524
 ] 

Ted Yu commented on HBASE-3307:
---

@Bryce:
Have you checked out HBASE-1744 ?

 Add checkAndPut to the Thrift API
 -

 Key: HBASE-3307
 URL: https://issues.apache.org/jira/browse/HBASE-3307
 Project: HBase
  Issue Type: New Feature
  Components: thrift
Affects Versions: 0.89.20100924
Reporter: Chris Tarnas
Priority: Minor

 It would be very useful to have the checkAndPut method available via the 
 Thrift API. This would both allow for easier atomic updates as well as cut 
 down on at least one Thrift roundtrip for quite a few common tasks. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-3307) Add checkAndPut to the Thrift API

2011-11-22 Thread Ted Yu (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155524#comment-13155524
 ] 

Ted Yu edited comment on HBASE-3307 at 11/22/11 10:46 PM:
--

@Bryce:
Have you checked out HBASE-1744 ?
ThriftHBaseServiceHandler implements the two APIs you mentioned.

It has been integrated into HBase TRUNK.

Once the feature gets validated, we can backport into 0.92 branch.

  was (Author: yuzhih...@gmail.com):
@Bryce:
Have you checked out HBASE-1744 ?
  
 Add checkAndPut to the Thrift API
 -

 Key: HBASE-3307
 URL: https://issues.apache.org/jira/browse/HBASE-3307
 Project: HBase
  Issue Type: New Feature
  Components: thrift
Affects Versions: 0.89.20100924
Reporter: Chris Tarnas
Priority: Minor

 It would be very useful to have the checkAndPut method available via the 
 Thrift API. This would both allow for easier atomic updates as well as cut 
 down on at least one Thrift roundtrip for quite a few common tasks. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-22 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155543#comment-13155543
 ] 

Jonathan Hsieh commented on HBASE-4842:
---

I'll file a new issue.

The main issue isn't what is returned, but when.  With the first 'hbck -fix', 
the master makes a call to the regionserver to issue a request open the region 
(which adds data to meta).  This returns right away.  The next hbck call will 
cause the master query meta again which is used to check consistency.  
Sometimes the new meta entries are fixed before the second hbck call is done 
(failing the test), sometimes it is not (not failing).  

The slight delay allows the open request to finish and the meta entry to be 
updated before the subsequent 'hbck' call.

 [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
 ---

 Key: HBASE-4842
 URL: https://issues.apache.org/jira/browse/HBASE-4842
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4, 0.92.0, 0.94.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 4842-v3.txt, hbase-4842-breaker.patch, hbase-4842.patch


 Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
 is intermittently failing.
 In the test, a region's assignment is purposely changed in META but not in 
 ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
 clean comes up with a new ZK assignment but with META still being 
 inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
 sometimes it moves to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-22 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155544#comment-13155544
 ] 

Jonathan Hsieh commented on HBASE-4842:
---

Also, I don't think dist log splitting has anything do to with this failure.

 [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
 ---

 Key: HBASE-4842
 URL: https://issues.apache.org/jira/browse/HBASE-4842
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4, 0.92.0, 0.94.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 4842-v3.txt, hbase-4842-breaker.patch, hbase-4842.patch


 Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
 is intermittently failing.
 In the test, a region's assignment is purposely changed in META but not in 
 ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
 clean comes up with a new ZK assignment but with META still being 
 inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
 sometimes it moves to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4842) [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck

2011-11-22 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1314#comment-1314
 ] 

stack commented on HBASE-4842:
--

bq. Also, I don't think dist log splitting has anything do to with this failure.

True.  This was a misunderstanding on my part of a J-D comment up on IRC.

 [hbck] Fix intermittent failures on TestHBaseFsck.testHBaseFsck
 ---

 Key: HBASE-4842
 URL: https://issues.apache.org/jira/browse/HBASE-4842
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4, 0.92.0, 0.94.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 4842-v3.txt, hbase-4842-breaker.patch, hbase-4842.patch


 Its seems that on the 0.92 branch in particular, TestHBaseFsck.testHBaseFsck 
 is intermittently failing.
 In the test, a region's assignment is purposely changed in META but not in 
 ZK.  After the equivalent of 'hbck -fix', a subsequent check that should be 
 clean comes up with a new ZK assignment but with META still being 
 inconsistent with ZK.  The RS in ZK sometimes this points to the same RS, but 
 sometimes it moves to another ZK. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4852) Tests that use RegionServer.openRegion such as TestHBaseFsck#testHBaseFsck should call openRegion synchronously

2011-11-22 Thread Jonathan Hsieh (Created) (JIRA)
Tests that use RegionServer.openRegion such as TestHBaseFsck#testHBaseFsck 
should call openRegion synchronously
---

 Key: HBASE-4852
 URL: https://issues.apache.org/jira/browse/HBASE-4852
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Jonathan Hsieh


Certain test cases like HBaseFsck#testHBaseFsck make calls to assign region 
servers and then read meta.  The tests or hbck should be modified to make the 
RegionServer.openRegion call act synchronously.

The main issue isn't what is returned, but when.  Specifically in 
HBaseFsck#testHBaseFsck, the first 'hbck -fix', the master makes a call to the 
regionserver to issue an asynchronous request to open the region (which adds 
data to meta).  The regionserver returns right away.  The next hbck call will 
cause the master query meta again which is used to check consistency.  A race 
is exposed -- sometimes the new meta entries are fixed before the second hbck 
call is done (failing the test), sometimes it is not (not failing).

The hack in HBASE-4842 introduces a slight delay which usually allows the open 
request to finish and the meta entry to be updated before the subsequent 'hbck' 
call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4120) isolation and allocation

2011-11-22 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155568#comment-13155568
 ] 

jirapos...@reviews.apache.org commented on HBASE-4120:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1421/#review3444
---


initial comments.  haven't had time to look through the whole diff in detail.  
I'm guessing this was made against 0.90?  This is missing integration with a 
lot of new functionality added a while ago into 92.


http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRPC.java
https://reviews.apache.org/r/1421/#comment7687

should this functionality not be disabled by default?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityFunction.java
https://reviews.apache.org/r/1421/#comment7696

this pull model seems a little hacky.  why can't we just push this 
information when a region comes online on this server?  we have online schema 
updates, so we can update the table priority with minimal downtime.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityFunction.java
https://reviews.apache.org/r/1421/#comment7694

this pull model seems a little hacky.  why can't we just push this 
information when a region comes online on this server?  we have online schema 
updates, so we can update the table priority with minimal downtime.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityFunction.java
https://reviews.apache.org/r/1421/#comment7688

instead of this big switch statement, why don't you add a getPriority() 
function to the Operation base class?  All database operations (gets, puts, 
deletes) are supposed to override this anyways do we can print out fingerprint 
and other debug information.  You can see its use in WritableRpcEngine.java



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityFunction.java
https://reviews.apache.org/r/1421/#comment7689

your intent is to lower the priority of multiputs, correct?  I see that 
HIGHEST_PRI = -10.  If a MultiAction has that priority, won't it make a 
MultiActionT have a higher priority than T?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityFunction.java
https://reviews.apache.org/r/1421/#comment7690

again, I'm confused about why we need to contact the master to get this 
information.  we only care about the regions that are online for this server, 
correct?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityFunction.java
https://reviews.apache.org/r/1421/#comment7691

can you explain this more?  what does this mapping look like?  In other 
words

thread pri 1-10 == system pri X-Y



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityJobQueue.java
https://reviews.apache.org/r/1421/#comment7700

why are all these accessors needed?  Why can't you just use 

if(this.queue.size()  this.capacity) queueFull.signal()



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityJobQueue.java
https://reviews.apache.org/r/1421/#comment7698

this should be signal() instead of signalAll().  You will only decrease 
capacity by 1, so you should only wake 1 thread.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityJobQueue.java
https://reviews.apache.org/r/1421/#comment7692

so, if you've waited too long, you'll add the Job to the queue anyways?  
it's not a tryAdd() then, which would imply that the action could fail.  It's 
just a normal add()



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityJobQueue.java
https://reviews.apache.org/r/1421/#comment7699

there is a race condition between (size = capacity) and addLock.  
Shouldn't you test the condition again after getting the lock?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/PriorityJobQueue.java
https://reviews.apache.org/r/1421/#comment7701

LOG.debug


- Nicolas


On 2011-11-22 06:06:45, Jia Liu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1421/
bq.  ---
bq.  
bq.  (Updated 2011-11-22 06:06:45)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Patch used for table priority alone,In this patch, not only tables can 
have 

[jira] [Commented] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

2011-11-22 Thread nkeywal (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155570#comment-13155570
 ] 

nkeywal commented on HBASE-4832:


fyi, the patch for the region server itself is in HBASE-4833, if the trunk 
changed I will update the patch.

 TestRegionServerCoprocessorExceptionWithAbort fails if the region server 
 stops too fast
 ---

 Key: HBASE-4832
 URL: https://issues.apache.org/jira/browse/HBASE-4832
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: Eugene Koontz
Priority: Minor
 Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, 
 HBASE-4832.patch, HBASE-4832.patch, HBASE-4832.patch, HBASE-4832.patch


 The current implementation of HRegionServer#stop is
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 synchronized (this) {
   // Wakes run() if it is sleeping
   notifyAll(); // FindBugs NN_NAKED_NOTIFY
 }
   }
 {noformat}
 The notification is sent on the wrong object and does nothing. As a 
 consequence, the region server continues to sleep instead of waking up and 
 stopping immediately. A correct implementation is:
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 // Wakes run() if it is sleeping
 sleeper.skipSleepCycle();
   }
 {noformat}
 Then the region server stops immediately. This makes the region server stops 
 0,5s faster on average, which is quite useful for unit tests.
 However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
 not work.
 It likely because the code does no expect the region server to stop that fast.
 The exception is:
 {noformat}
 testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
   Time elapsed: 30.06 sec   ERROR!
 java.lang.Exception: test timed out after 3 milliseconds
   at java.lang.Throwable.fillInStackTrace(Native Method)
   at java.lang.Throwable.init(Throwable.java:196)
   at java.lang.Exception.init(Exception.java:41)
   at java.lang.InterruptedException.init(InterruptedException.java:48)
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
   at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725)
   at 
 

[jira] [Commented] (HBASE-4811) Support reverse Scan

2011-11-22 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155573#comment-13155573
 ] 

stack commented on HBASE-4811:
--

I'd suggest you spend more time w/ the code base to see how much of effort 
would be required doing a reverse scan (Superficially, yes, our MemStore is a 
NavigableSet but that is not what client interacts with; ditto our sstable-like 
hfile thing.  IIRC leveldb counsels that the reverse range is going against the 
grain and at a minimum is much slower than the natural scan).

 Support reverse Scan
 

 Key: HBASE-4811
 URL: https://issues.apache.org/jira/browse/HBASE-4811
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.6
Reporter: John Carrino

 All the documentation I find about HBase says that if you want forward and 
 reverse scans you should just build 2 tables and one be ascending and one 
 descending.  Is there a fundamental reason that HBase only supports forward 
 Scan?  It seems like a lot of extra space overhead and coding overhead (to 
 keep them in sync) to support 2 tables.  
 I am assuming this has been discussed before, but I can't find the 
 discussions anywhere about it or why it would be infeasible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4811) Support reverse Scan

2011-11-22 Thread Nicolas Spiegelberg (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg updated HBASE-4811:
---

Comment: was deleted

(was: Yeah, I'm not that familiar with the codebase, but I'd assume that in
order to get forward scans you'd have to have the data sorted. And
from what I understand it is internally stored as sstables or
HFiles. If you have it sorted to scan in one direction, it seems
pretty easy to go the other direction.  LevelDb uses ssTables and
supports reverse ranges.

The only thing that I could think of from the design (from a high
level) that might make it difficult to do reverse ranges is dealing
with splitting ranges when moving ranges from one region server to
another.

Just from a quick look at MemStore that you mention, it uses a
KeyValueSkipListSet under the covers that is a NavigableSet and
supports descendingSet and descendingIterator.

-jc


On Tue, Nov 22, 2011 at 12:52 PM, stack (Commented) (JIRA)
)

 Support reverse Scan
 

 Key: HBASE-4811
 URL: https://issues.apache.org/jira/browse/HBASE-4811
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.20.6
Reporter: John Carrino

 All the documentation I find about HBase says that if you want forward and 
 reverse scans you should just build 2 tables and one be ascending and one 
 descending.  Is there a fundamental reason that HBase only supports forward 
 Scan?  It seems like a lot of extra space overhead and coding overhead (to 
 keep them in sync) to support 2 tables.  
 I am assuming this has been discussed before, but I can't find the 
 discussions anywhere about it or why it would be infeasible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4848) TestScanner failing because hostname can't be null

2011-11-22 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155576#comment-13155576
 ] 

Hudson commented on HBASE-4848:
---

Integrated in HBase-0.92 #161 (See 
[https://builds.apache.org/job/HBase-0.92/161/])
HBASE-4848 TestScanner failing because hostname can't be null

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanner.java


 TestScanner failing because hostname can't be null
 --

 Key: HBASE-4848
 URL: https://issues.apache.org/jira/browse/HBASE-4848
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: stack
Assignee: stack
 Fix For: 0.90.5

 Attachments: 4848-092.txt, 4848.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4849) TestCatalogTracker can fail if an existing zookeeper running

2011-11-22 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155575#comment-13155575
 ] 

Hudson commented on HBASE-4849:
---

Integrated in HBase-0.92 #161 (See 
[https://builds.apache.org/job/HBase-0.92/161/])
HBASE-4849 TestCatalogTracker can fail if an existing zookeeper running
HBASE-4849 TestCatalogTracker can fail if an existing zookeeper running

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt

stack : 
Files : 
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java


 TestCatalogTracker can fail if an existing zookeeper running
 

 Key: HBASE-4849
 URL: https://issues.apache.org/jira/browse/HBASE-4849
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: 4849.txt


 This fact sunk my attempt at building an RC.  Fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

2011-11-22 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155574#comment-13155574
 ] 

Hadoop QA commented on HBASE-4832:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12504809/HBASE-4832.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 66 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.master.TestDistributedLogSplitting

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/339//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/339//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/339//console

This message is automatically generated.

 TestRegionServerCoprocessorExceptionWithAbort fails if the region server 
 stops too fast
 ---

 Key: HBASE-4832
 URL: https://issues.apache.org/jira/browse/HBASE-4832
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: Eugene Koontz
Priority: Minor
 Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, 
 HBASE-4832.patch, HBASE-4832.patch, HBASE-4832.patch, HBASE-4832.patch


 The current implementation of HRegionServer#stop is
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 synchronized (this) {
   // Wakes run() if it is sleeping
   notifyAll(); // FindBugs NN_NAKED_NOTIFY
 }
   }
 {noformat}
 The notification is sent on the wrong object and does nothing. As a 
 consequence, the region server continues to sleep instead of waking up and 
 stopping immediately. A correct implementation is:
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 // Wakes run() if it is sleeping
 sleeper.skipSleepCycle();
   }
 {noformat}
 Then the region server stops immediately. This makes the region server stops 
 0,5s faster on average, which is quite useful for unit tests.
 However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
 not work.
 It likely because the code does no expect the region server to stop that fast.
 The exception is:
 {noformat}
 testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
   Time elapsed: 30.06 sec   ERROR!
 java.lang.Exception: test timed out after 3 milliseconds
   at java.lang.Throwable.fillInStackTrace(Native Method)
   at java.lang.Throwable.init(Throwable.java:196)
   at java.lang.Exception.init(Exception.java:41)
   at java.lang.InterruptedException.init(InterruptedException.java:48)
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
   at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)

[jira] [Created] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-22 Thread stack (Created) (JIRA)
HBASE-4789 does overzealous pruning of seqids
-

 Key: HBASE-4853
 URL: https://issues.apache.org/jira/browse/HBASE-4853
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical


Working w/ J-D on failing replication test turned up hole in seqids made by the 
patch over in hbase-4789.  With this patch in place we see lots of instances of 
the suspicious: 'Last sequenceid written is empty. Deleting all old hlogs'

At a minimum, these lines need removing:

{code}
diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
index 623edbe..a0bbe01 100644
--- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
+++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
@@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
   // Cleaning up of lastSeqWritten is in the finally clause because we
   // don't want to confuse getOldestOutstandingSeqNum()
   this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
-  Long l = this.lastSeqWritten.remove(encodedRegionName);
-  if (l != null) {
-LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? 
name= +
-  Bytes.toString(encodedRegionName) + , seqid= + l);
-   }
   this.cacheFlushLock.unlock();
 }
   }
{code}

... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-22 Thread Jean-Daniel Cryans (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155578#comment-13155578
 ] 

Jean-Daniel Cryans commented on HBASE-4853:
---

An example of data loss.

First, we see a flush with seqid being 1389 but the message Why is there a raw 
encodedRegionName in lastSeqWritten means it deleted 1389.

{quote}
2011-11-22 11:21:08,526 WARN  
[RegionServer:1;hbasedev,54530,1321989619241.cacheFlusher] wal.HLog(1364): Why 
is there a raw encodedRegionName in lastSeqWritten? 
name=a9b1d96a4554f545a74142bc42f8c48e, seqid=1389
2011-11-22 11:21:08,526 INFO  
[RegionServer:1;hbasedev,54530,1321989619241.cacheFlusher] 
regionserver.HRegion(1259): Finished memstore flush of ~654.3k for region 
test,,1321989625594.a9b1d96a4554f545a74142bc42f8c48e. in 493ms, 
sequenceid=1388, compaction requested=false
{quote}

HLog roll happens, it sees 0 last seq id so it clears all the WALs.

{quote}
2011-11-22 11:21:08,932 INFO  
[RegionServer:1;hbasedev,54530,1321989619241.logRoller] wal.HLog(582): Roll 
/user/jdcryans/.logs/hbasedev,54530,1321989619241/hbasedev%2C54530%2C1321989619241.1321989667490,
 entries=41, filesize=25596.  for 
/user/jdcryans/.logs/hbasedev,54530,1321989619241/hbasedev%2C54530%2C1321989619241.1321989668526
2011-11-22 11:21:08,932 DEBUG 
[RegionServer:1;hbasedev,54530,1321989619241.logRoller] wal.HLog(593): Last 
sequenceid written is empty. Deleting all old hlogs
{quote}

The last one had max seq id as 1424, which was never flushed.

{quote}
2011-11-22 11:21:08,967 INFO  
[RegionServer:1;hbasedev,54530,1321989619241.logRoller] wal.HLog(823): moving 
old hlog file 
/user/jdcryans/.logs/hbasedev,54530,1321989619241/hbasedev%2C54530%2C1321989619241.1321989667490
 whose highest sequenceid is 1424 to 
/user/jdcryans/.oldlogs/hbasedev%2C54530%2C1321989619241.1321989667490
{quote}

 HBASE-4789 does overzealous pruning of seqids
 -

 Key: HBASE-4853
 URL: https://issues.apache.org/jira/browse/HBASE-4853
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical

 Working w/ J-D on failing replication test turned up hole in seqids made by 
 the patch over in hbase-4789.  With this patch in place we see lots of 
 instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
 old hlogs'
 At a minimum, these lines need removing:
 {code}
 diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
 b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 index 623edbe..a0bbe01 100644
 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
// Cleaning up of lastSeqWritten is in the finally clause because we
// don't want to confuse getOldestOutstandingSeqNum()
this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
 -  Long l = this.lastSeqWritten.remove(encodedRegionName);
 -  if (l != null) {
 -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? 
 name= +
 -  Bytes.toString(encodedRegionName) + , seqid= + l);
 -   }
this.cacheFlushLock.unlock();
  }
}
 {code}
 ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4854) it seems that CLASSPATH elements coming from Hadoop change HBase behaviour

2011-11-22 Thread Roman Shaposhnik (Created) (JIRA)
it seems that CLASSPATH elements coming from Hadoop change HBase behaviour
--

 Key: HBASE-4854
 URL: https://issues.apache.org/jira/browse/HBASE-4854
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 0.92.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik


It looks like HBASE-3465 introduced a slight change in behavior. The ordering 
of classpath elements makes Hadoop ones go before the HBase ones, which leads 
to log4j properties picked up from the wrong place, etc. It seems that the 
easies way to fix that would be to revert the ordering of classpath.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4120) isolation and allocation

2011-11-22 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155584#comment-13155584
 ] 

Ted Yu commented on HBASE-4120:
---

@Nicolas:
Thanks for your valuable review.

This feature was developed independently of HBASE-1730/HBASE-4213.
Do you think priority refreshing can be done in a separate JIRA ?

 isolation and allocation
 

 Key: HBASE-4120
 URL: https://issues.apache.org/jira/browse/HBASE-4120
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver
Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
Reporter: Liu Jia
Assignee: Liu Jia
 Fix For: 0.94.0

 Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
 Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
 HBase_isolation_and_allocation_user_guide.pdf, 
 Performance_of_Table_priority.pdf, System Structure.jpg, TablePriority.patch, 
 TablePriority_v8.patch, TablePriority_v8.patch, 
 TablePriority_v8_for_trunk.patch


 The HBase isolation and allocation tool is designed to help users manage 
 cluster resource among different application and tables.
 When we have a large scale of HBase cluster with many applications running on 
 it, there will be lots of problems. In Taobao there is a cluster for many 
 departments to test their applications performance, these applications are 
 based on HBase. With one cluster which has 12 servers, there will be only one 
 application running exclusively on this server, and many other applications 
 must wait until the previous test finished.
 After we add allocation manage function to the cluster, applications can 
 share the cluster and run concurrently. Also if the Test Engineer wants to 
 make sure there is no interference, he/she can move out other tables from 
 this group.
 In groups we use table priority to allocate resource, when system is busy; we 
 can make sure high-priority tables are not affected lower-priority tables
 Different groups can have different region server configurations, some groups 
 optimized for reading can have large block cache size, and others optimized 
 for writing can have large memstore size. 
 Tables and region servers can be moved easily between groups; after changing 
 the configuration, a group can be restarted alone instead of restarting the 
 whole cluster.
 git entry : https://github.com/ICT-Ope/HBase_allocation .
 We hope our work is helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4854) it seems that CLASSPATH elements coming from Hadoop change HBase behaviour

2011-11-22 Thread Roman Shaposhnik (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Shaposhnik updated HBASE-4854:


Status: Patch Available  (was: Open)

 it seems that CLASSPATH elements coming from Hadoop change HBase behaviour
 --

 Key: HBASE-4854
 URL: https://issues.apache.org/jira/browse/HBASE-4854
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 0.92.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Attachments: HBASE-4854.patch.txt


 It looks like HBASE-3465 introduced a slight change in behavior. The ordering 
 of classpath elements makes Hadoop ones go before the HBase ones, which leads 
 to log4j properties picked up from the wrong place, etc. It seems that the 
 easies way to fix that would be to revert the ordering of classpath.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4854) it seems that CLASSPATH elements coming from Hadoop change HBase behaviour

2011-11-22 Thread Roman Shaposhnik (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Shaposhnik updated HBASE-4854:


Attachment: HBASE-4854.patch.txt

 it seems that CLASSPATH elements coming from Hadoop change HBase behaviour
 --

 Key: HBASE-4854
 URL: https://issues.apache.org/jira/browse/HBASE-4854
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 0.92.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Attachments: HBASE-4854.patch.txt


 It looks like HBASE-3465 introduced a slight change in behavior. The ordering 
 of classpath elements makes Hadoop ones go before the HBase ones, which leads 
 to log4j properties picked up from the wrong place, etc. It seems that the 
 easies way to fix that would be to revert the ordering of classpath.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92

2011-11-22 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155596#comment-13155596
 ] 

Lars Hofhansl commented on HBASE-4838:
--

This still fails:
{noformat}
testFilterAcrossMultipleRegions(org.apache.hadoop.hbase.client.TestFromClientSid
e)  Time elapsed: 12.233 sec   FAILURE!
java.lang.AssertionError: expected:17576 but was:28064
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at org.apache.hadoop.hbase.client.TestFromClientSide.assertRowCount(Test
FromClientSide.java:528)
at org.apache.hadoop.hbase.client.TestFromClientSide.testFilterAcrossMul
tipleRegions(TestFromClientSide.java:436)
{noformat}

I went through the entire diff between 0.92 with this patch and trunk. There is 
nothing that sticks out that could be causing this.

This problem be easily reproduced by reducing the number of rows generated in 
HBaseTestingUtility.loadTable(...) to (note: replace 'z' with 'a' and 'b'):
{noformat}
  public int loadTable(final HTable t, final byte[] f) throws IOException {
t.setAutoFlush(false);
byte[] k = new byte[3];
int rowCount = 0;
for (byte b1 = 'a'; b1 = 'a'; b1++) {
  for (byte b2 = 'a'; b2 = 'a'; b2++) {
for (byte b3 = 'a'; b3 = 'b'; b3++) {
  k[0] = b1;
  k[1] = b2;
  k[2] = b3;
  Put put = new Put(k);
  put.add(f, null, k);
  t.put(put);
  rowCount++;
}
  }
}
t.flushCommits();
return rowCount;
  }
{noformat}
this will only generate two rows for this test, which makes it easier to debug.

The same test then fails with:
{noformat}
java.lang.AssertionError: expected:2 but was:4
...
{noformat}

Somehow it is scanning more rows after a split that it is supposed to repeating 
some of the same rows. If somebody else wanted have a look at it, I could use 
another pair of eyes.


 Port 2856 (TestAcidGuarantee is failing) to 0.92
 

 Key: HBASE-4838
 URL: https://issues.apache.org/jira/browse/HBASE-4838
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 4838-v1.txt


 Moving back port into a separate issue (as suggested by JonH), because this 
 not trivial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >