[jira] [Created] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

2011-11-20 Thread nkeywal (Created) (JIRA)
TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops 
too fast
---

 Key: HBASE-4832
 URL: https://issues.apache.org/jira/browse/HBASE-4832
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.94.0
Reporter: nkeywal
Priority: Minor


The current implementation of HRegionServer#stop is

{noformat}
  public void stop(final String msg) {
this.stopped = true;
LOG.info(STOPPED:  + msg);
synchronized (this) {
  // Wakes run() if it is sleeping
  notifyAll(); // FindBugs NN_NAKED_NOTIFY
}
  }
{noformat}

The notification is sent on the wrong object and does nothing. As a 
consequence, the region server continues to sleep instead of waking up and 
stopping immediately. A correct implementation is:

{noformat}
  public void stop(final String msg) {
this.stopped = true;
LOG.info(STOPPED:  + msg);
// Wakes run() if it is sleeping
sleeper.skipSleepCycle();
  }
{noformat}

Then the region server stops immediately. This makes the region server stops 
0,5s faster on average, which is quite useful for unit tests.

However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not 
work.
It likely because the code does no expect the region server to stop that fast.

The exception is:
{noformat}
testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
  Time elapsed: 30.06 sec   ERROR!
java.lang.Exception: test timed out after 3 milliseconds
at java.lang.Throwable.fillInStackTrace(Native Method)
at java.lang.Throwable.init(Throwable.java:196)
at java.lang.Exception.init(Exception.java:41)
at java.lang.InterruptedException.init(InterruptedException.java:48)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
at 
org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
at 
org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
at 
org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
at 
org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
at 
org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
at 
org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
at 
org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
at 
org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725)
at 
org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort.testExceptionFromCoprocessorDuringPut(TestRegionServerCoprocessorExceptionWithAbort.java:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at 

[jira] [Updated] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast

2011-11-20 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4832:
---

Attachment: 4832_trunk_hregionserver.patch

4832_trunk_hregionserver.patch contains the fix on HRegionServer which makes 
the coprocessor test fails.

 TestRegionServerCoprocessorExceptionWithAbort fails if the region server 
 stops too fast
 ---

 Key: HBASE-4832
 URL: https://issues.apache.org/jira/browse/HBASE-4832
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.94.0
Reporter: nkeywal
Priority: Minor
 Attachments: 4832_trunk_hregionserver.patch


 The current implementation of HRegionServer#stop is
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 synchronized (this) {
   // Wakes run() if it is sleeping
   notifyAll(); // FindBugs NN_NAKED_NOTIFY
 }
   }
 {noformat}
 The notification is sent on the wrong object and does nothing. As a 
 consequence, the region server continues to sleep instead of waking up and 
 stopping immediately. A correct implementation is:
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 // Wakes run() if it is sleeping
 sleeper.skipSleepCycle();
   }
 {noformat}
 Then the region server stops immediately. This makes the region server stops 
 0,5s faster on average, which is quite useful for unit tests.
 However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
 not work.
 It likely because the code does no expect the region server to stop that fast.
 The exception is:
 {noformat}
 testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
   Time elapsed: 30.06 sec   ERROR!
 java.lang.Exception: test timed out after 3 milliseconds
   at java.lang.Throwable.fillInStackTrace(Native Method)
   at java.lang.Throwable.init(Throwable.java:196)
   at java.lang.Exception.init(Exception.java:41)
   at java.lang.InterruptedException.init(InterruptedException.java:48)
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
   at 
 org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
   at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
   at 
 org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354)
   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892)
   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750)
   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725)
   at 
 org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort.testExceptionFromCoprocessorDuringPut(TestRegionServerCoprocessorExceptionWithAbort.java:84)
   at 

[jira] [Created] (HBASE-4833) HRegionServer stops could be 0,5s faster

2011-11-20 Thread nkeywal (Created) (JIRA)
HRegionServer stops could be 0,5s faster


 Key: HBASE-4833
 URL: https://issues.apache.org/jira/browse/HBASE-4833
 Project: HBase
  Issue Type: Improvement
  Components: regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor


The current implementation of HRegionServer#stop is

{noformat}
  public void stop(final String msg) {
this.stopped = true;
LOG.info(STOPPED:  + msg);
synchronized (this) {
  // Wakes run() if it is sleeping
  notifyAll(); // FindBugs NN_NAKED_NOTIFY
}
  }
{noformat}

The notification is sent on the wrong object and does nothing. As a 
consequence, the region server continues to sleep instead of waking up and 
stopping immediately. A correct implementation is:

{noformat}
  public void stop(final String msg) {
this.stopped = true;
LOG.info(STOPPED:  + msg);
// Wakes run() if it is sleeping
sleeper.skipSleepCycle();
  }
{noformat}

Then the region server stops immediately. This makes the region server stops 
0,5s faster on average, which is quite useful for unit tests.

However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not 
work.
It likely because the code does no expect the region server to stop that fast. 
See HBASE-4832



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4833) HRegionServer stops could be 0,5s faster

2011-11-20 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4833:
---

Attachment: 4833_trunk_hregionserver.patch

 HRegionServer stops could be 0,5s faster
 

 Key: HBASE-4833
 URL: https://issues.apache.org/jira/browse/HBASE-4833
 Project: HBase
  Issue Type: Improvement
  Components: regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4833_trunk_hregionserver.patch


 The current implementation of HRegionServer#stop is
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 synchronized (this) {
   // Wakes run() if it is sleeping
   notifyAll(); // FindBugs NN_NAKED_NOTIFY
 }
   }
 {noformat}
 The notification is sent on the wrong object and does nothing. As a 
 consequence, the region server continues to sleep instead of waking up and 
 stopping immediately. A correct implementation is:
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 // Wakes run() if it is sleeping
 sleeper.skipSleepCycle();
   }
 {noformat}
 Then the region server stops immediately. This makes the region server stops 
 0,5s faster on average, which is quite useful for unit tests.
 However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
 not work.
 It likely because the code does no expect the region server to stop that 
 fast. See HBASE-4832

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-20 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4798:
---

Attachment: 4798_trunk_all.v10.patch

 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 
 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 
 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch


 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
 stuck if the notification is received before the wait.
 HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
 TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
 with the change on HBaseTestingUtility we are 60s faster
 TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
 of 1s
 TestRestartCluster#testClusterRestart: send all the table creation together, 
 then check creation, should be faster
 TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
 JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
 HConnectionManager#close: Zookeeper name in debug message from 
 HConnectionManager after connection close was always null because it was set 
 to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-20 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4798:
---

Status: Open  (was: Patch Available)

 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 
 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 
 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch


 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
 stuck if the notification is received before the wait.
 HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
 TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
 with the change on HBaseTestingUtility we are 60s faster
 TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
 of 1s
 TestRestartCluster#testClusterRestart: send all the table creation together, 
 then check creation, should be faster
 TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
 JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
 HConnectionManager#close: Zookeeper name in debug message from 
 HConnectionManager after connection close was always null because it was set 
 to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-20 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4798:
---

Status: Patch Available  (was: Open)

 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 
 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 
 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch


 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
 stuck if the notification is received before the wait.
 HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
 TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
 with the change on HBaseTestingUtility we are 60s faster
 TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
 of 1s
 TestRestartCluster#testClusterRestart: send all the table creation together, 
 then check creation, should be faster
 TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
 JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
 HConnectionManager#close: Zookeeper name in debug message from 
 HConnectionManager after connection close was always null because it was set 
 to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153766#comment-13153766
 ] 

Hadoop QA commented on HBASE-4798:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12504426/4798_trunk_all.v10.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 21 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -166 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 59 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.replication.TestReplication

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/314//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/314//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/314//console

This message is automatically generated.

 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 
 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 4798_trunk_all.v5.patch, 
 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch


 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
 stuck if the notification is received before the wait.
 HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
 TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
 with the change on HBaseTestingUtility we are 60s faster
 TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
 of 1s
 TestRestartCluster#testClusterRestart: send all the table creation together, 
 then check creation, should be faster
 TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
 JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
 HConnectionManager#close: Zookeeper name in debug message from 
 HConnectionManager after connection close was always null because it was set 
 to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-20 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153774#comment-13153774
 ] 

Jonathan Hsieh commented on HBASE-2856:
---

On trunk, TestAcidGuarantees ran for a solid day and a half (33+ hours) without 
failing.  

larsh@ I'll loop the 0.92 version and let it run through today and report how 
it fared around midday monday.

 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-20 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153775#comment-13153775
 ] 

Jonathan Hsieh commented on HBASE-2856:
---

On trunk, TestAcidGuarantees ran for a solid day and a half (33+ hours) without 
failing.  

larsh@ I'll loop the 0.92 version and let it run through today and report how 
it fared around midday monday.

 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-20 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4798:
---

Status: Open  (was: Patch Available)

 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 
 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 
 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch


 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
 stuck if the notification is received before the wait.
 HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
 TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
 with the change on HBaseTestingUtility we are 60s faster
 TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
 of 1s
 TestRestartCluster#testClusterRestart: send all the table creation together, 
 then check creation, should be faster
 TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
 JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
 HConnectionManager#close: Zookeeper name in debug message from 
 HConnectionManager after connection close was always null because it was set 
 to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-20 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4798:
---

Attachment: 4798_trunk_all.v10.patch

 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 
 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 
 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch


 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
 stuck if the notification is received before the wait.
 HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
 TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
 with the change on HBaseTestingUtility we are 60s faster
 TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
 of 1s
 TestRestartCluster#testClusterRestart: send all the table creation together, 
 then check creation, should be faster
 TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
 JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
 HConnectionManager#close: Zookeeper name in debug message from 
 HConnectionManager after connection close was always null because it was set 
 to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-20 Thread nkeywal (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4798:
---

Status: Patch Available  (was: Open)

I tend to think that the patch is ok and that the errors we're seeing are the 
usual flaky stuff, but let's try again. 

 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 
 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 
 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch


 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
 stuck if the notification is received before the wait.
 HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
 TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
 with the change on HBaseTestingUtility we are 60s faster
 TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
 of 1s
 TestRestartCluster#testClusterRestart: send all the table creation together, 
 then check creation, should be faster
 TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
 JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
 HConnectionManager#close: Zookeeper name in debug message from 
 HConnectionManager after connection close was always null because it was set 
 to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4815) Disable online altering by default, create a config for it

2011-11-20 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153828#comment-13153828
 ] 

ramkrishna.s.vasudevan commented on HBASE-4815:
---

Thanks Stack.  Should have completed the verification of all test cases..
Usually used to do it.. as it was friday i left office before it could get 
completed.  Next time will be more careful..

 Disable online altering by default, create a config for it
 --

 Key: HBASE-4815
 URL: https://issues.apache.org/jira/browse/HBASE-4815
 Project: HBase
  Issue Type: Task
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: ramkrishna.s.vasudevan
Priority: Blocker
 Fix For: 0.92.0

 Attachments: 4815-v2.txt, 4815.addendum, 4815.patch


 There's a whole class of bugs that we've been revealing from trying out 
 online altering in conjunction with other operations like splitting. 
 HBASE-4729, HBASE-4794, and HBASE-4814 are examples.
 It's not so much that the online altering code is buggy, but that it wasn't 
 tested in an environment that permits splitting.
 I think we should mark online altering as experimental in 0.92 and add a 
 config to enable it (so it would be disabled by default, requiring people to 
 enable for altering table schema).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4798) Sleeps and synchronisation improvements for tests

2011-11-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153831#comment-13153831
 ] 

Hadoop QA commented on HBASE-4798:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12504430/4798_trunk_all.v10.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 21 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -166 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 59 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.coprocessor.TestMasterObserver
  org.apache.hadoop.hbase.regionserver.wal.TestLogRolling

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/315//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/315//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/315//console

This message is automatically generated.

 Sleeps and synchronisation improvements for tests
 -

 Key: HBASE-4798
 URL: https://issues.apache.org/jira/browse/HBASE-4798
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 
 4798_trunk_all.v10.patch, 4798_trunk_all.v10.patch, 4798_trunk_all.v2.patch, 
 4798_trunk_all.v5.patch, 4798_trunk_all.v6.patch, 4798_trunk_all.v7.patch


 Multiple small changes:
 @commiters: Removing some sleeps made visible a bug on 
 JVMClusterUtil#HMaster#waitForServerOnline, so I had to add a synchro point. 
 You may want to review this.
 JVMClusterUtil#HMaster#waitForServerOnline: removed, the condition was never 
 met (test on !c  !!c). Added a new synchronization point.
 AssignementManager#waitForAssignment: add a timeout on the wait = not stuck 
 if the notification is received before the wait.
 HMaster#loop: use a notification instead of a 1s sleep
 HRegionServer#waitForServerOnline: new method used by 
 JVMClusterUtil#waitForServerOnline() to replace a 1s sleep by a notification
 HRegionServer#getMaster() 1s sleeps replaced by one 0,1s sleep and one 0,2s 
 sleep
 HRegionServer#stop: use a notification on sleeper to lower shutdown by 0,5s
 ZooKeeperNodeTracker#start: replace a recursive call by a loop
 ZooKeeperNodeTracker#blockUntilAvailable: add a timeout on the wait = not 
 stuck if the notification is received before the wait.
 HBaseTestingUtility#expireSession: use a timeout of 1s instead of 5s
 TestZooKeeper#testClientSessionExpired: use a timeout of 1s instead of 5s, 
 with the change on HBaseTestingUtility we are 60s faster
 TestRegionRebalancing#waitForAllRegionsAssigned: use a sleep of 0,2s instead 
 of 1s
 TestRestartCluster#testClusterRestart: send all the table creation together, 
 then check creation, should be faster
 TestHLog: shutdown the whole cluster instead of DFS only (more standard) 
 JVMClusterUtil#startup: lower the sleep from 1s to 0,1s
 HConnectionManager#close: Zookeeper name in debug message from 
 HConnectionManager after connection close was always null because it was set 
 to null in the delete.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4834) CopyTable: Cannot have ZK source to destination

2011-11-20 Thread Linden Hillenbrand (Created) (JIRA)
CopyTable: Cannot have ZK source to destination
---

 Key: HBASE-4834
 URL: https://issues.apache.org/jira/browse/HBASE-4834
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.1
Reporter: Linden Hillenbrand


During a Copy Table, involving --peer.adr, we found the following block of code:

if (address != null) {
ZKUtil.applyClusterKeyToConf(this.conf, address);
   }

When we set ZK conf in setConf method, that also gets called in frontend when 
MR initializes TOF, so there's no way now to have two ZK points for a single 
job, cause source gets reset before job is submitted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4834) CopyTable: Cannot have ZK source to destination

2011-11-20 Thread Linden Hillenbrand (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Linden Hillenbrand updated HBASE-4834:
--

Priority: Critical  (was: Major)

 CopyTable: Cannot have ZK source to destination
 ---

 Key: HBASE-4834
 URL: https://issues.apache.org/jira/browse/HBASE-4834
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.1
Reporter: Linden Hillenbrand
Priority: Critical

 During a Copy Table, involving --peer.adr, we found the following block of 
 code:
 if (address != null) {
 ZKUtil.applyClusterKeyToConf(this.conf, address);
}
 When we set ZK conf in setConf method, that also gets called in frontend when 
 MR initializes TOF, so there's no way now to have two ZK points for a single 
 job, cause source gets reset before job is submitted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4834) CopyTable: Cannot have ZK source to destination

2011-11-20 Thread Linden Hillenbrand (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153837#comment-13153837
 ] 

Linden Hillenbrand commented on HBASE-4834:
---

Going to move to Table.Format.GetRecordWriter so that it is called. Moving 
entire block in setconf in TableOutputFormat.java.

Harsh to test out locally, I will submit patch shortly.

 CopyTable: Cannot have ZK source to destination
 ---

 Key: HBASE-4834
 URL: https://issues.apache.org/jira/browse/HBASE-4834
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.1
Reporter: Linden Hillenbrand
Priority: Critical

 During a Copy Table, involving --peer.adr, we found the following block of 
 code:
 if (address != null) {
 ZKUtil.applyClusterKeyToConf(this.conf, address);
}
 When we set ZK conf in setConf method, that also gets called in frontend when 
 MR initializes TOF, so there's no way now to have two ZK points for a single 
 job, cause source gets reset before job is submitted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4834) CopyTable: Cannot have ZK source to destination

2011-11-20 Thread Harsh J (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153839#comment-13153839
 ] 

Harsh J commented on HBASE-4834:


Frontend calls setConf of TableOutputFormat when its initialized for 
output-spec checks upon job.submit(), and that reloads the actual ZK keys in 
job.xml itself (which, in Hadoop, is written _after_ checkOutputSpecs(…) and 
such).

 CopyTable: Cannot have ZK source to destination
 ---

 Key: HBASE-4834
 URL: https://issues.apache.org/jira/browse/HBASE-4834
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.1
Reporter: Linden Hillenbrand
Priority: Critical

 During a Copy Table, involving --peer.adr, we found the following block of 
 code:
 if (address != null) {
 ZKUtil.applyClusterKeyToConf(this.conf, address);
}
 When we set ZK conf in setConf method, that also gets called in frontend when 
 MR initializes TOF, so there's no way now to have two ZK points for a single 
 job, cause source gets reset before job is submitted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-4834) CopyTable: Cannot have ZK source to destination

2011-11-20 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HBASE-4834.


Resolution: Duplicate

This was fixed by HBASE-3497. Resolving as dup.

Apologies for the noise, and for the confusion Linden!

Regards,
Harsh

 CopyTable: Cannot have ZK source to destination
 ---

 Key: HBASE-4834
 URL: https://issues.apache.org/jira/browse/HBASE-4834
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.1
Reporter: Linden Hillenbrand
Priority: Critical

 During a Copy Table, involving --peer.adr, we found the following block of 
 code:
 if (address != null) {
 ZKUtil.applyClusterKeyToConf(this.conf, address);
}
 When we set ZK conf in setConf method, that also gets called in frontend when 
 MR initializes TOF, so there's no way now to have two ZK points for a single 
 job, cause source gets reset before job is submitted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4834) CopyTable: Cannot have ZK source to destination

2011-11-20 Thread Linden Hillenbrand (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153845#comment-13153845
 ] 

Linden Hillenbrand commented on HBASE-4834:
---

No worries, at least it was fixed and someone beat us to it. Thanks for your 
help on the investigation Harsh!

 CopyTable: Cannot have ZK source to destination
 ---

 Key: HBASE-4834
 URL: https://issues.apache.org/jira/browse/HBASE-4834
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.1
Reporter: Linden Hillenbrand
Priority: Critical

 During a Copy Table, involving --peer.adr, we found the following block of 
 code:
 if (address != null) {
 ZKUtil.applyClusterKeyToConf(this.conf, address);
}
 When we set ZK conf in setConf method, that also gets called in frontend when 
 MR initializes TOF, so there's no way now to have two ZK points for a single 
 job, cause source gets reset before job is submitted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-20 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153850#comment-13153850
 ] 

Lars Hofhansl commented on HBASE-2856:
--

Thanks Jon!




 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-20 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153852#comment-13153852
 ] 

Lars Hofhansl commented on HBASE-2856:
--

@Nicolas and @Amit, could you review the 0.92 patch? I turned out to be much 
more manual than I had wished or expected, so it is very possible, that I 
missed something.

(I tried to upload the 0.92 patch to review board for easier verification but 
apparently that does not work for branches other than trunk.)

 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-20 Thread Lars Hofhansl (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153852#comment-13153852
 ] 

Lars Hofhansl edited comment on HBASE-2856 at 11/20/11 7:20 PM:


@Nicolas and @Amit, could you review the 0.92 patch? I turned out to be much 
more manual than I had wished or expected, so it is very possible that I missed 
something.

(I tried to upload the 0.92 patch to review board for easier verification, but 
apparently that does not work for branches other than trunk.)

  was (Author: lhofhansl):
@Nicolas and @Amit, could you review the 0.92 patch? I turned out to be 
much more manual than I had wished or expected, so it is very possible, that I 
missed something.

(I tried to upload the 0.92 patch to review board for easier verification but 
apparently that does not work for branches other than trunk.)
  
 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-4831) LRU stats thread should be a daemon thread

2011-11-20 Thread Andrew Purtell (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-4831.
---

Resolution: Duplicate

Duplicate of HBASE-4745, already resolved

 LRU stats thread should be a daemon thread
 --

 Key: HBASE-4831
 URL: https://issues.apache.org/jira/browse/HBASE-4831
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani
Assignee: Prakash Khemani

 I have seen the hung processes where the following was the only non-daemon 
 thread
 LRU Statistics #0 prio=10 tid=0x2ab0bc04f800 nid=0x11ac waiting on 
 condition [0x42f57000]
java.lang.Thread.State: TIMED_WAITING (parking)
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  0x2aaab9a1c000 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
   at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
   at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:609)
   at 
 java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:602)
   at 
 java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
   at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-2418) add support for ZooKeeper authentication

2011-11-20 Thread Andrew Purtell (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-2418:
--

   Resolution: Fixed
Fix Version/s: 0.94.0
 Release Note: 
This adds support for protecting the state of HBase znodes on a multi-tenant 
ZooKeeper cluster. This support requires ZK 3.4.0. It is a companion patch to 
HBASE-2742 (secure RPC), and HBASE-3025 (Coprocessor based access control).

SASL authentication of ZooKeeper clients with the quorum is handled in the ZK 
client independently of HBase concerns. To enable strong ZK authentication, one 
must create a suitable JaaS configuration, for example:

  Server {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab=/etc/hbase/conf/hbase.keytab
storeKey=true
useTicketCache=false
principal=zookeeper/$HOSTNAME;
  };
  Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
useTicketCache=false
keyTab=/etc/hbase/conf/hbase.keytab
principal=hbase/$HOSTNAME;
  };

and then configure both the client and server processes to use it, for example 
in hbase-site.xml:

  HBASE_OPTS=${HBASE_OPTS} 
-Djava.security.auth.login.config=/etc/hbase/conf/jaas.conf
  HBASE_OPTS=${HBASE_OPTS} -Dzookeeper.kerberos.removeHostFromPrincipal=true
  HBASE_OPTS=${HBASE_OPTS} -Dzookeeper.kerberos.removeRealmFromPrincipal=true

HBase will then secure all znodes but for a few world-readable read-only ones 
needed for clients to look up region locations. All internal cluster operations 
will be protected from unauthenticated ZK clients, or clients not authenticated 
to the HBase principal. Presumably the only ZK clients authenticated to the 
HBase principal will be those embedded in the master and regionservers.

We will pull in a Hadoop artifact patched with HADOOP-7070 if building under 
the security profile (-P security). 0.20.205 does not yet include HADOOP-7070. 
Without it, the JAAS configuration required for secure operation of the 
ZooKeeper client will be ignored.
   Status: Resolved  (was: Patch Available)

Committed to trunk and 0.92.

TestZooKeeperACL passes with and without '-P security' locally. Does not break 
the build if '-P security' is not specified. Test failures found by HudsonQA 
are not directly related to this change.


 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: 

[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153889#comment-13153889
 ] 

Hudson commented on HBASE-2418:
---

Integrated in HBase-0.92 #152 (See 
[https://builds.apache.org/job/HBase-0.92/152/])
HBASE-2418 Support for ZooKeeper authentication

apurtell : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/pom.xml
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperACL.java


 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153888#comment-13153888
 ] 

Hudson commented on HBASE-2418:
---

Integrated in HBase-TRUNK #2466 (See 
[https://builds.apache.org/job/HBase-TRUNK/2466/])
HBASE-2418 Support for ZooKeeper authentication

apurtell : 
Files : 
* /hbase/trunk/pom.xml
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperACL.java


 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-2418) add support for ZooKeeper authentication

2011-11-20 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-2418:
--

Attachment: 2418.addendum

Addendum adds Gary's maven repository to pom

 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-20 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153894#comment-13153894
 ] 

Ted Yu commented on HBASE-2418:
---

Applied addendum to 0.92 branch.
Build 153 is running tests as this moment.

 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4809) Per-CF set RPC metrics

2011-11-20 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-4809:
---

Attachment: D483.3.patch

mbautin updated the revision [jira] [HBASE-4809] Per-CF set RPC metrics.
Reviewers: nspiegelberg, JIRA, Kannan, Karthik

  Rebasing on most recent changes to the trunk and fixing a bug in 
StoreScanner. Unit tests pass, I will start a run on Hudson. Cluster testing is 
still to be done.

REVISION DETAIL
  https://reviews.facebook.net/D483

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
  src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java
  
src/test/java/org/apache/hadoop/hbase/regionserver/metrics/TestSchemaMetrics.java


 Per-CF set RPC metrics
 --

 Key: HBASE-4809
 URL: https://issues.apache.org/jira/browse/HBASE-4809
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D483.1.patch, D483.2.patch, D483.3.patch


 Porting per-CF set metrics for RPC times and response sizes from 0.89-fb to 
 trunk. For each mutation signature (a set of column families involved in an 
 RPC request) we increment several metrics, allowing to monitor access 
 patterns.  We deal with guarding against an explosion of the number of 
 metrics in HBASE-4638 (which might even be implemented as part of this JIRA).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4809) Per-CF set RPC metrics

2011-11-20 Thread Mikhail Bautin (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4809:
--

Attachment: HBASE-4809_Per_CF_set_RPC_metrics.patch

This corresponds to D483.3.patch.

 Per-CF set RPC metrics
 --

 Key: HBASE-4809
 URL: https://issues.apache.org/jira/browse/HBASE-4809
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D483.1.patch, D483.2.patch, D483.3.patch, 
 HBASE-4809_Per_CF_set_RPC_metrics.patch


 Porting per-CF set metrics for RPC times and response sizes from 0.89-fb to 
 trunk. For each mutation signature (a set of column families involved in an 
 RPC request) we increment several metrics, allowing to monitor access 
 patterns.  We deal with guarding against an explosion of the number of 
 metrics in HBASE-4638 (which might even be implemented as part of this JIRA).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4809) Per-CF set RPC metrics

2011-11-20 Thread Mikhail Bautin (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4809:
--

Release Note: Testing the patch on Hudson.
  Status: Patch Available  (was: Open)

 Per-CF set RPC metrics
 --

 Key: HBASE-4809
 URL: https://issues.apache.org/jira/browse/HBASE-4809
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D483.1.patch, D483.2.patch, D483.3.patch, 
 HBASE-4809_Per_CF_set_RPC_metrics.patch


 Porting per-CF set metrics for RPC times and response sizes from 0.89-fb to 
 trunk. For each mutation signature (a set of column families involved in an 
 RPC request) we increment several metrics, allowing to monitor access 
 patterns.  We deal with guarding against an explosion of the number of 
 metrics in HBASE-4638 (which might even be implemented as part of this JIRA).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-20 Thread Lars Hofhansl (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153852#comment-13153852
 ] 

Lars Hofhansl edited comment on HBASE-2856 at 11/20/11 11:38 PM:
-

@Nicolas and @Amit, could you review the 0.92 patch? It turned out to be much 
more manual than I had wished or expected, so it is very possible that I missed 
something.

(I tried to upload the 0.92 patch to review board for easier verification, but 
apparently that does not work for branches other than trunk.)

  was (Author: lhofhansl):
@Nicolas and @Amit, could you review the 0.92 patch? I turned out to be 
much more manual than I had wished or expected, so it is very possible that I 
missed something.

(I tried to upload the 0.92 patch to review board for easier verification, but 
apparently that does not work for branches other than trunk.)
  
 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153913#comment-13153913
 ] 

Hudson commented on HBASE-2418:
---

Integrated in HBase-0.92 #153 (See 
[https://builds.apache.org/job/HBase-0.92/153/])
HBASE-2418 Addendum adds Gary's maven repo to pom.xml

tedyu : 
Files : 
* /hbase/branches/0.92/pom.xml


 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-20 Thread Andrew Purtell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153921#comment-13153921
 ] 

Andrew Purtell commented on HBASE-2418:
---

Thanks Ted. I thought that went in with HBASE-3025.

 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-20 Thread Andrew Purtell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153923#comment-13153923
 ] 

Andrew Purtell commented on HBASE-2418:
---

And it looks like this part of the POM in trunk is not in the POM on 0.92:

{code}
  pluginRepositories
pluginRepository
  idghelmling.testing/id
  nameGary Helmling test repo/name
  urlhttp://people.apache.org/~garyh/mvn//url
  snapshots
enabledtrue/enabled
  /snapshots
  releases
enabledtrue/enabled
  /releases
/pluginRepository
  /pluginRepositories
{code}

I don't know enough about Maven or how Gary set up the security profile to know 
if it is needed or not. Gary?

 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-20 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153925#comment-13153925
 ] 

Jonathan Hsieh commented on HBASE-2856:
---

@larsh I posted it for you here.  https://reviews.apache.org/r/2893/

I applied the patch, committed it and generated a git-patch via 'git 
format-patch HEAD^' which has enough info to find the right branch.

 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4809) Per-CF set RPC metrics

2011-11-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153930#comment-13153930
 ] 

Hadoop QA commented on HBASE-4809:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12504450/HBASE-4809_Per_CF_set_RPC_metrics.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -166 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 61 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.regionserver.wal.TestLogRolling

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/316//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/316//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/316//console

This message is automatically generated.

 Per-CF set RPC metrics
 --

 Key: HBASE-4809
 URL: https://issues.apache.org/jira/browse/HBASE-4809
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D483.1.patch, D483.2.patch, D483.3.patch, 
 HBASE-4809_Per_CF_set_RPC_metrics.patch


 Porting per-CF set metrics for RPC times and response sizes from 0.89-fb to 
 trunk. For each mutation signature (a set of column families involved in an 
 RPC request) we increment several metrics, allowing to monitor access 
 patterns.  We deal with guarding against an explosion of the number of 
 metrics in HBASE-4638 (which might even be implemented as part of this JIRA).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153931#comment-13153931
 ] 

Hudson commented on HBASE-2418:
---

Integrated in HBase-0.92-security #2 (See 
[https://builds.apache.org/job/HBase-0.92-security/2/])
HBASE-2418 Addendum adds Gary's maven repo to pom.xml
HBASE-2418 Support for ZooKeeper authentication

tedyu : 
Files : 
* /hbase/branches/0.92/pom.xml

apurtell : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/pom.xml
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/MiniZooKeeperCluster.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperACL.java


 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-20 Thread Andrew Purtell (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153934#comment-13153934
 ] 

Andrew Purtell commented on HBASE-2418:
---

Hudson returned another build failure report. I committed the above to the 0.92 
POM.

 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153942#comment-13153942
 ] 

jirapos...@reviews.apache.org commented on HBASE-4820:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2895/
---

Review request for hbase, Todd Lipcon and Jonathan Robie.


Summary
---

Distributed log splitting coding enhancement to make it easier to understand, 
no semantics change.
It is some issue raised during the code review in back porting this feature to 
CDH.


This addresses bug HBASE-4820.
https://issues.apache.org/jira/browse/HBASE-4820


Diffs
-

  src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java f7ef653 
  src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java b9a3a2c 
  src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 7dd67e9 
  src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java 
1d329b0 
  src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 
21747b1 
  src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java 
51daa1f 
  src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java c8684ec 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitLogWorker.java 
84d76e8 

Diff: https://reviews.apache.org/r/2895/diff


Testing
---

Ran unit tests. Non-flaky tests are green. Two client tests didn't pass, which 
are not related to this change.


Thanks,

Jimmy



 Distributed log splitting coding enhancement to make it easier to understand, 
 no semantics change
 -

 Key: HBASE-4820
 URL: https://issues.apache.org/jira/browse/HBASE-4820
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: newbie
 Fix For: 0.94.0

 Attachments: 
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch


 In reviewing distributed log splitting feature, we found some cosmetic 
 issues.  They make the code hard to understand.
 It will be great to fix them.  For this issue, there should be no semantic 
 change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153966#comment-13153966
 ] 

jirapos...@reviews.apache.org commented on HBASE-4820:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2895/#review3385
---



src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
https://reviews.apache.org/r/2895/#comment7563

handleDeadWorkers would be a better method name.



src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
https://reviews.apache.org/r/2895/#comment7562

Please remove white space.



src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
https://reviews.apache.org/r/2895/#comment7564

retry_count is the remaining count. This log message should be clearer.



src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
https://reviews.apache.org/r/2895/#comment7565

Can we implement this item now ?



src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
https://reviews.apache.org/r/2895/#comment7566

We should say 'remaining retries='



src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
https://reviews.apache.org/r/2895/#comment7567

Please adjust indentation.



src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
https://reviews.apache.org/r/2895/#comment7571

Please adjust indentation for these 4 lines.



src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java
https://reviews.apache.org/r/2895/#comment7570

Should read 'splitlog workers'



src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
https://reviews.apache.org/r/2895/#comment7572

Adjust indentation, please.


- Ted


On 2011-11-21 02:06:29, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2895/
bq.  ---
bq.  
bq.  (Updated 2011-11-21 02:06:29)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon and Jonathan Robie.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Distributed log splitting coding enhancement to make it easier to 
understand, no semantics change.
bq.  It is some issue raised during the code review in back porting this 
feature to CDH.
bq.  
bq.  
bq.  This addresses bug HBASE-4820.
bq.  https://issues.apache.org/jira/browse/HBASE-4820
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
f7ef653 
bq.src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java 
b9a3a2c 
bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 
7dd67e9 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java 
1d329b0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 
21747b1 
bq.
src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java 
51daa1f 
bq.src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java 
c8684ec 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitLogWorker.java 
84d76e8 
bq.  
bq.  Diff: https://reviews.apache.org/r/2895/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran unit tests. Non-flaky tests are green. Two client tests didn't pass, 
which are not related to this change.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 Distributed log splitting coding enhancement to make it easier to understand, 
 no semantics change
 -

 Key: HBASE-4820
 URL: https://issues.apache.org/jira/browse/HBASE-4820
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: newbie
 Fix For: 0.94.0

 Attachments: 
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch


 In reviewing distributed log splitting feature, we found some cosmetic 
 issues.  They make the code hard to understand.
 It will be great to fix them.  For this issue, there should be no semantic 
 change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153968#comment-13153968
 ] 

Hudson commented on HBASE-2418:
---

Integrated in HBase-0.92 #154 (See 
[https://builds.apache.org/job/HBase-0.92/154/])
Amend HBASE-2418 Add pluginRepositories to POM

apurtell : 
Files : 
* /hbase/branches/0.92/pom.xml


 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-20 Thread gaojinchao (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153969#comment-13153969
 ] 

gaojinchao commented on HBASE-4739:
---

HBASE-4739_trail5 made a few changes, Please review, if it makes sense, I will 
verify in a real cluster.


 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, 
 HBASE-4739_trial.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-20 Thread Ted Yu (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152585#comment-13152585
 ] 

Ted Yu edited comment on HBASE-4739 at 11/21/11 4:09 AM:
-

@J-D
In 0.92 version, uses HBASE-4739_Trunk_V2 in timeout monitor for sending a 
CLOSING rpc.(I try to modify this patch)
In trunk, uses patch 4739_trialV3.
Hbase is used by thousands of people. If this problem occurred once, it may 
occur more. So I think we need to solve this issue.

What do you say J-D? 

I will do some more detailed testing about these patches and give my test cases.



  was (Author: sunnygao):
@J-D
In 0.92 version, uses HBASE-4739_Trunk_V2 in timeout monitor for sending a 
CLOSING rpc.(I try to modify this patch)
In trunk, uses patch 4739_trialV3.
Hbase thousands of people in the use of, If we once, may appear more. So I 
think we need slove this isse.

What do you say J-D? 

I will do some more detailed testing about these patches and give my test cases.


  
 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, 
 HBASE-4739_trial.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-20 Thread Gary Helmling (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153974#comment-13153974
 ] 

Gary Helmling commented on HBASE-2418:
--

The {{pluginRepositories/}} entry was added for HBASE-4763/HBASE-4781 for the 
custom maven-surefire build.  It's not needed for the security components and 
should not be in the 0.92 branch as far as I can tell (HBASE-4781 is marked for 
0.94).

 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-20 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153975#comment-13153975
 ] 

Hudson commented on HBASE-2418:
---

Integrated in HBase-0.92-security #3 (See 
[https://builds.apache.org/job/HBase-0.92-security/3/])
Amend HBASE-2418 Add pluginRepositories to POM

apurtell : 
Files : 
* /hbase/branches/0.92/pom.xml


 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2418) add support for ZooKeeper authentication

2011-11-20 Thread Gary Helmling (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153976#comment-13153976
 ] 

Gary Helmling commented on HBASE-2418:
--

http://monitoring.apache.org/status/ is showing people.apache.org is down 
(minotaur.apache.org).  This is probably the cause of the build failures, which 
are showing connection timed out retrieving artifacts from my repo.

 add support for ZooKeeper authentication
 

 Key: HBASE-2418
 URL: https://issues.apache.org/jira/browse/HBASE-2418
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Reporter: Patrick Hunt
Assignee: Eugene Koontz
Priority: Critical
  Labels: security, zookeeper
 Fix For: 0.92.0, 0.94.0

 Attachments: 2418.addendum, HBASE-2418-6.patch, HBASE-2418-6.patch


 Some users may run a ZooKeeper cluster in multi tenant mode meaning that 
 more than one client service would
 like to share a single ZooKeeper service instance (cluster). In this case the 
 client services typically want to protect
 their data (ZK znodes) from access by other services (tenants) on the 
 cluster. Say you are running HBase and Solr 
 and Neo4j, or multiple HBase instances, etc... having 
 authentication/authorization on the znodes is important for both 
 security and helping to ensure that services don't interact negatively (touch 
 each other's data).
 Today HBase does not have support for authentication or authorization. This 
 should be added to the HBase clients
 that are accessing the ZK cluster. In general it means calling addAuthInfo 
 once after a session is established:
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooKeeper.html#addAuthInfo(java.lang.String,
  byte[])
 with a user specific credential, often times this is a shared secret or 
 certificate. You may be able to statically configure this
 in some cases (config string or file to read from), however in my case in 
 particular you may need to access it programmatically,
 which adds complexity as the end user may need to load code into HBase for 
 accessing the credential.
 Secondly you need to specify a non world ACL when interacting with znodes 
 (create primarily):
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/data/ACL.html
 http://hadoop.apache.org/zookeeper/docs/current/api/org/apache/zookeeper/ZooDefs.html
 Feel free to ping the ZooKeeper team if you have questions. It might also be 
 good to discuss with some 
 potential end users - in particular regarding how the end user can specify 
 the credential.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4830) Regionserver BLOCKED on WAITING DFSClient$DFSOutputStream.waitForAckedSeqno running 0.20.205.0+

2011-11-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153977#comment-13153977
 ] 

stack commented on HBASE-4830:
--

Thanks Todd.

Here's the OOME out in IPC:

{code}
Exception in thread IPC Reader 8 on port 7003 java.lang.OutOfMemoryError: 
Java heap space
at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:39)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1157)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:703)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:495)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:470)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{code}

Because it happened out here we don't get benefit of HBASE-4769 and abort 
immediately.  Need a fix so we abort immediately instead of hang around in 
zombie mode as this server was doing.

 Regionserver BLOCKED on WAITING DFSClient$DFSOutputStream.waitForAckedSeqno 
 running 0.20.205.0+
 ---

 Key: HBASE-4830
 URL: https://issues.apache.org/jira/browse/HBASE-4830
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Attachments: hbase-stack-regionserver-sv4r9s38.out


 Running 0.20.205.1 (I was not at tip of the branch) I ran into the following 
 hung regionserver:
 {code}
 regionserver7003.logRoller daemon prio=10 tid=0x7fd98028f800 nid=0x61af 
 in Object.wait() [0x7fd987bfa000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:485)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.waitForAckedSeqno(DFSClient.java:3606)
 - locked 0xf8656788 (a java.util.LinkedList)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3595)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3687)
 - locked 0xf8656458 (a 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3626)
 at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
 at 
 org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
 at 
 org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:966)
 - locked 0xf8655998 (a 
 org.apache.hadoop.io.SequenceFile$Writer)
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.close(SequenceFileLogWriter.java:214)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:791)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:578)
 - locked 0xc443deb0 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 Other threads are like this (here's a sample):
 {code}
 regionserver7003.logSyncer daemon prio=10 tid=0x7fd98025e000 nid=0x61ae 
 waiting for monitor entry [0x7fd987cfb000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1074)
 - waiting to lock 0xc443deb0 (a java.lang.Object)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1195)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1057)
 at java.lang.Thread.run(Thread.java:662)
 
 IPC Server handler 0 on 7003 daemon prio=10 tid=0x7fd98049b800 
 nid=0x61b8 waiting for monitor entry [0x7fd9872f1000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:1007)
 - waiting to lock 0xc443deb0 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1798)
 at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1668)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2980)
 at sun.reflect.GeneratedMethodAccessor636.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at 

[jira] [Commented] (HBASE-4833) HRegionServer stops could be 0,5s faster

2011-11-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153981#comment-13153981
 ] 

stack commented on HBASE-4833:
--

+1 on patch but can't commit it if it breaks 
TestRegionServerCoprocessorExceptionWithAbort.  Any chance of including fix for 
that N?

 HRegionServer stops could be 0,5s faster
 

 Key: HBASE-4833
 URL: https://issues.apache.org/jira/browse/HBASE-4833
 Project: HBase
  Issue Type: Improvement
  Components: regionserver, test
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Attachments: 4833_trunk_hregionserver.patch


 The current implementation of HRegionServer#stop is
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 synchronized (this) {
   // Wakes run() if it is sleeping
   notifyAll(); // FindBugs NN_NAKED_NOTIFY
 }
   }
 {noformat}
 The notification is sent on the wrong object and does nothing. As a 
 consequence, the region server continues to sleep instead of waking up and 
 stopping immediately. A correct implementation is:
 {noformat}
   public void stop(final String msg) {
 this.stopped = true;
 LOG.info(STOPPED:  + msg);
 // Wakes run() if it is sleeping
 sleeper.skipSleepCycle();
   }
 {noformat}
 Then the region server stops immediately. This makes the region server stops 
 0,5s faster on average, which is quite useful for unit tests.
 However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does 
 not work.
 It likely because the code does no expect the region server to stop that 
 fast. See HBASE-4832

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153983#comment-13153983
 ] 

stack commented on HBASE-2856:
--

You fellas want this in 0.92?   I want to cut a 0.92 RC.  I have 0.92 tests 
passing up on jenkins a few times in a row now and all criticals and blockers 
are in.  Should we wait?  Or should we cut the RC and get this into the second 
RC (Im sure there'll be one).

 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-20 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153985#comment-13153985
 ] 

stack commented on HBASE-2856:
--

Do all tests pass w/ 0.92 version of this patch in place?

 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4820) Distributed log splitting coding enhancement to make it easier to understand, no semantics change

2011-11-20 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153986#comment-13153986
 ] 

jirapos...@reviews.apache.org commented on HBASE-4820:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2895/#review3388
---



src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
https://reviews.apache.org/r/2895/#comment7574

Can we make this msg more clear.
Something like
Unexpected state : statename.. Cannot transit znode state from : 
currentState to OFFLINE.


- ramkrishna


On 2011-11-21 02:06:29, Jimmy Xiang wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2895/
bq.  ---
bq.  
bq.  (Updated 2011-11-21 02:06:29)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon and Jonathan Robie.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Distributed log splitting coding enhancement to make it easier to 
understand, no semantics change.
bq.  It is some issue raised during the code review in back porting this 
feature to CDH.
bq.  
bq.  
bq.  This addresses bug HBASE-4820.
bq.  https://issues.apache.org/jira/browse/HBASE-4820
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
f7ef653 
bq.src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java 
b9a3a2c 
bq.src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java 
7dd67e9 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java 
1d329b0 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java 
21747b1 
bq.
src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java 
51daa1f 
bq.src/test/java/org/apache/hadoop/hbase/master/TestSplitLogManager.java 
c8684ec 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitLogWorker.java 
84d76e8 
bq.  
bq.  Diff: https://reviews.apache.org/r/2895/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Ran unit tests. Non-flaky tests are green. Two client tests didn't pass, 
which are not related to this change.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Jimmy
bq.  
bq.



 Distributed log splitting coding enhancement to make it easier to understand, 
 no semantics change
 -

 Key: HBASE-4820
 URL: https://issues.apache.org/jira/browse/HBASE-4820
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
  Labels: newbie
 Fix For: 0.94.0

 Attachments: 
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch,
  
 0001-HBASE-4820-Distributed-log-splitting-coding-enhancement-to-makeit-easier-to-understand,-no-semantics-change..patch


 In reviewing distributed log splitting feature, we found some cosmetic 
 issues.  They make the code hard to understand.
 It will be great to fix them.  For this issue, there should be no semantic 
 change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-20 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153999#comment-13153999
 ] 

Hadoop QA commented on HBASE-4739:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12504464/HBASE-4739_trail5.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -166 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 60 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestAdmin

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/317//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/317//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/317//console

This message is automatically generated.

 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, 
 HBASE-4739_trial.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-20 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154001#comment-13154001
 ] 

ramkrishna.s.vasudevan commented on HBASE-4739:
---

This is similar to the initial version you had given.  
+1 with the change.  As i said any way RS will not allow the transition from 
happening again if it was already processing it.  

But we need to confirm like if we need to handle 
RegionAlreadyInTransitionException exception.  It may be needed.

 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, 
 HBASE-4739_trial.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-20 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154002#comment-13154002
 ] 

Lars Hofhansl commented on HBASE-2856:
--

Re: 0.92, I was going by your comment above
bq. If someone did it in next day or so, I'd be up for having it committed to 
0.92 in time for first RC.

It's not entirely clean, yet:

{noformat}
Results :

Failed tests:   testClosing(org.apache.hadoop.hbase.client.TestHCM)
  
testFilterAcrossMultipleRegions(org.apache.hadoop.hbase.client.TestFromClientSide):
 expected:17576 but was:28064
  testForceSplit(org.apache.hadoop.hbase.client.TestAdmin): Scanned more than 
expected (6000)
  testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): Scanned 
more than expected (6000)
  
testSplitWhileBulkLoadPhase(org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery)
  
testGroupOrSplitPresplit(org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery)
  testWholesomeSplit(org.apache.hadoop.hbase.regionserver.TestSplitTransaction)
  testRollback(org.apache.hadoop.hbase.regionserver.TestSplitTransaction)
  testBasicSplit(org.apache.hadoop.hbase.regionserver.TestHRegion)

Tests in error: 
  
testShutdownFixupWhenDaughterHasSplit(org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster):
 test timed out after 30 milliseconds

Tests run: 1065, Failures: 9, Errors: 1, Skipped: 7
{noformat}

I have no time to look at these tonight, though. But that probably points to 
another RC.

Would sure be nice if the acid guarantees that HBase claims would be met in 
0.92 :)


 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4830) Regionserver BLOCKED on WAITING DFSClient$DFSOutputStream.waitForAckedSeqno running 0.20.205.0+

2011-11-20 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154004#comment-13154004
 ] 

Todd Lipcon commented on HBASE-4830:


We could ship with a config -XX:OnOutOfMemoryError=kill -9 %p or whatever 
that trick is...

 Regionserver BLOCKED on WAITING DFSClient$DFSOutputStream.waitForAckedSeqno 
 running 0.20.205.0+
 ---

 Key: HBASE-4830
 URL: https://issues.apache.org/jira/browse/HBASE-4830
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Attachments: hbase-stack-regionserver-sv4r9s38.out


 Running 0.20.205.1 (I was not at tip of the branch) I ran into the following 
 hung regionserver:
 {code}
 regionserver7003.logRoller daemon prio=10 tid=0x7fd98028f800 nid=0x61af 
 in Object.wait() [0x7fd987bfa000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:485)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.waitForAckedSeqno(DFSClient.java:3606)
 - locked 0xf8656788 (a java.util.LinkedList)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3595)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3687)
 - locked 0xf8656458 (a 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
 at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3626)
 at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
 at 
 org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
 at 
 org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:966)
 - locked 0xf8655998 (a 
 org.apache.hadoop.io.SequenceFile$Writer)
 at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.close(SequenceFileLogWriter.java:214)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.cleanupCurrentWriter(HLog.java:791)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:578)
 - locked 0xc443deb0 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 Other threads are like this (here's a sample):
 {code}
 regionserver7003.logSyncer daemon prio=10 tid=0x7fd98025e000 nid=0x61ae 
 waiting for monitor entry [0x7fd987cfb000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1074)
 - waiting to lock 0xc443deb0 (a java.lang.Object)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1195)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1057)
 at java.lang.Thread.run(Thread.java:662)
 
 IPC Server handler 0 on 7003 daemon prio=10 tid=0x7fd98049b800 
 nid=0x61b8 waiting for monitor entry [0x7fd9872f1000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:1007)
 - waiting to lock 0xc443deb0 (a java.lang.Object)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.java:1798)
 at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1668)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2980)
 at sun.reflect.GeneratedMethodAccessor636.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1325)
 {code}
 Looks like HDFS-1529?  (Todd?)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2856) TestAcidGuarantee broken on trunk

2011-11-20 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154009#comment-13154009
 ] 

Lars Hofhansl commented on HBASE-2856:
--

@Jon: Thanks for uploading to RB, btw.

 TestAcidGuarantee broken on trunk 
 --

 Key: HBASE-2856
 URL: https://issues.apache.org/jira/browse/HBASE-2856
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621
Reporter: ryan rawson
Assignee: Amitanand Aiyer
Priority: Blocker
 Fix For: 0.94.0

 Attachments: 2856-0.92.txt, 2856-v2.txt, 2856-v3.txt, 2856-v4.txt, 
 2856-v5.txt, 2856-v6.txt, 2856-v7.txt, 2856-v8.txt, 
 2856-v9-all-inclusive.txt, acid.txt


 TestAcidGuarantee has a test whereby it attempts to read a number of columns 
 from a row, and every so often the first column of N is different, when it 
 should be the same.  This is a bug deep inside the scanner whereby the first 
 peek() of a row is done at time T then the rest of the read is done at T+1 
 after a flush, thus the memstoreTS data is lost, and previously 'uncommitted' 
 data becomes committed and flushed to disk.
 One possible solution is to introduce the memstoreTS (or similarly equivalent 
 value) to the HFile thus allowing us to preserve read consistency past 
 flushes.  Another solution involves fixing the scanners so that peek() is not 
 destructive (and thus might return different things at different times alas).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-20 Thread gaojinchao (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154010#comment-13154010
 ] 

gaojinchao commented on HBASE-4739:
---

I think we don't need handle RegionAlreadyInTransitionException exception. We 
only need update the timestamp of RIT,we have done.
my reason is :
1. The moniter timeout is 30 minutes, There are enough time to close a region.
2. if the RS throws RegionAlreadyInTransitionException exception, we need 
update the timestamp of RIT and wait next timeout.


 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, 
 HBASE-4739_trial.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-20 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13154026#comment-13154026
 ] 

ramkrishna.s.vasudevan commented on HBASE-4739:
---

Yes Gao.. thats is what i said when i meant need to handle 
RegionAlreadyInTransitionException .
Explicitly catch that exception in the catch block of unassign() and handle the 
exception by updating the timestamp of RIT. 

Good work.

 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Trunk.patch, HBASE-4739_Trunk_V2.patch, HBASE-4739_trail5.patch, 
 HBASE-4739_trial.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira