date:20111102


 [ 
https://issues.apache.org/jira/browse/HBASE-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4722:
-

Attachment: logging-v2.txt

Bit more logging. Since I added this, I can't make it fail locally.  Its like a 
timing issue seemingly where we skip flushing seemingly.  Still digging.

 TestGlobalMemStoreSize has started failing
 --

 Key: HBASE-4722
 URL: https://issues.apache.org/jira/browse/HBASE-4722
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical
 Attachments: logging-v2.txt, logging.txt


 I'm digging in.  It fails occasionally for me locally to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4722) TestGlobalMemStoreSize has started failing


[ 
https://issues.apache.org/jira/browse/HBASE-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141948#comment-13141948
 ] 

stack commented on HBASE-4722:
--

I committed logging-v2 so can get more info when fails on jenkins since can't 
make it fail local (and I'm going to bed...)

 TestGlobalMemStoreSize has started failing
 --

 Key: HBASE-4722
 URL: https://issues.apache.org/jira/browse/HBASE-4722
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical
 Attachments: logging-v2.txt, logging.txt


 I'm digging in.  It fails occasionally for me locally to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4703) Improvements in tests


[ 
https://issues.apache.org/jira/browse/HBASE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141957#comment-13141957
 ] 

nkeywal commented on HBASE-4703:


Yes, the join are failing, and as the main thread does not end the JVM does not 
end. I pulled the trunk again, I still have the issue on rpc version.

 Improvements in tests
 -

 Key: HBASE-4703
 URL: https://issues.apache.org/jira/browse/HBASE-4703
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.92.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.92.0

 Attachments: 20111030_4703_all.patch, 20111030_4703_all.v2.patch, 
 20111030_4703_all.v3.patch, 20111030_4703_all.v4.patch


 Global:
  - when possible, make the test using the default cluster configuration for 
 the number of region (1 instead of 2 or 3). This allows a faster stop/start, 
 and is a step toward a shared cluster configuration.
  - 'sleep': lower or remove the sleep based synchronisation in the tests (in 
 HBaseTestingUtility, TestGlobalMemStoreSize, TestAdmin, 
 TestCoprocessorEndpoint, TestHFileOutputFormat, TestLruBlockCache, 
 TestServerCustomProtocol, TestReplicationSink)
   - Optimize 'put' by setting setWriteToWAL to false, when the 'put' is big 
 or in a loop. Not done for tests that rely on the WAL.
  
 Local issues:
 - TestTableInputFormatScan fully deletes the hadoop.tmp.dir directory on 
 tearDown, that makes it impossible to use in // with another test using this 
 directory
 - TestIdLock logs too much (9000 lines per seconds). Test time lowered to 15 
 seconds to make it a part of the small subset
 - TestMemoryBoundedLogMessageBuffer useless System.out.println
 - io.hfile.TestReseekTo useless System.out.println
 - TestTableInputFormat does not shutdown the cluster
 - testGlobalMemStore does not shutdown the cluster
 - rest.client.TestRemoteAdmin: simplified, does not use local admin, single 
 test instead of two.
 - HBaseTestingUtility#ensureSomeRegionServersAvailable starts only one 
 server, should start the number of missing server instead.
 - TestMergeTool should starts/stops the dfs cluster with HBaseTestingUtility

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4724) TestAdmin hangs randomly in trunk

2011-11-02 Thread nkeywal (Created) (JIRA)

TestAdmin hangs randomly in trunk
-

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Priority: Critical



fom the logs in my env
{noformat}
2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
server = 29)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
{noformat}
Anyway, after this the logs finishes with:
{noformat}
2011-11-01 15:54:35,132 INFO  
[Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
Master:0;localhost,39664,1320187706355
{noformat}
it's in
{noformat}
sun.management.ThreadImpl.getThreadInfo1(Native Method)
sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)

org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)

org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)

org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)

org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
{noformat}
So that's at least why adding a timeout wont help and may be why it does not 
end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.

I also wonder if the root cause of the non ending is my modif on the wal, with 
some threads surprised to have updates that were not written in the wal. Here 
is the full stack dump:
{noformat}
Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
nkeywal):
  State: TIMED_WAITING
  Blocked count: 360
  Waited count: 359
  Stack:
java.lang.Object.wait(Native Method)
org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
  State: WAITING
  Blocked count: 0
  Waited count: 4
  Waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
  Stack:
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
Thread 271 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
  State: RUNNABLE
  Blocked count: 2
  Waited count: 0
  Stack:
sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
Thread 152 (Master:0;localhost,39664,1320187706355):
  State: WAITING
  Blocked count: 217
  Waited count: 174
  Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
  Stack:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)

org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)

org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)

org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:523)

org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:468)
org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309)
java.lang.Thread.run(Thread.java:662)
Thread 165 (LruBlockCache.EvictionThread):
  State: WAITING
  Blocked count: 0
  Waited count: 1
  Waiting on 
org.apache.hadoop.hbase.io.hfile.LruBlockCache$EvictionThread@3e9d7b56
  Stack:

[jira] [Commented] (HBASE-4703) Improvements in tests

2011-11-02 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141961#comment-13141961
 ] 

nkeywal commented on HBASE-4703:


Created HBASE-4724 to track this.

 Improvements in tests
 -

 Key: HBASE-4703
 URL: https://issues.apache.org/jira/browse/HBASE-4703
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.92.0
 Environment: all
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
 Fix For: 0.92.0

 Attachments: 20111030_4703_all.patch, 20111030_4703_all.v2.patch, 
 20111030_4703_all.v3.patch, 20111030_4703_all.v4.patch


 Global:
  - when possible, make the test using the default cluster configuration for 
 the number of region (1 instead of 2 or 3). This allows a faster stop/start, 
 and is a step toward a shared cluster configuration.
  - 'sleep': lower or remove the sleep based synchronisation in the tests (in 
 HBaseTestingUtility, TestGlobalMemStoreSize, TestAdmin, 
 TestCoprocessorEndpoint, TestHFileOutputFormat, TestLruBlockCache, 
 TestServerCustomProtocol, TestReplicationSink)
   - Optimize 'put' by setting setWriteToWAL to false, when the 'put' is big 
 or in a loop. Not done for tests that rely on the WAL.
  
 Local issues:
 - TestTableInputFormatScan fully deletes the hadoop.tmp.dir directory on 
 tearDown, that makes it impossible to use in // with another test using this 
 directory
 - TestIdLock logs too much (9000 lines per seconds). Test time lowered to 15 
 seconds to make it a part of the small subset
 - TestMemoryBoundedLogMessageBuffer useless System.out.println
 - io.hfile.TestReseekTo useless System.out.println
 - TestTableInputFormat does not shutdown the cluster
 - testGlobalMemStore does not shutdown the cluster
 - rest.client.TestRemoteAdmin: simplified, does not use local admin, single 
 test instead of two.
 - HBaseTestingUtility#ensureSomeRegionServersAvailable starts only one 
 server, should start the number of missing server instead.
 - TestMergeTool should starts/stops the dfs cluster with HBaseTestingUtility

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows


[ 
https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141962#comment-13141962
 ] 

jirapos...@reviews.apache.org commented on HBASE-4536:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2178/#review3009
---



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
https://reviews.apache.org/r/2178/#comment6726

Hi Lars, Isn't this early-out problematic? It doesn't take into account 
min-versions. It doesn't take into account the newly introduced 
keepDeletedCells mode.


- Prakash


On 2011-10-18 21:43:38, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2178/
bq.  ---
bq.  
bq.  (Updated 2011-10-18 21:43:38)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HBase timerange Gets and Scans allow to do timetravel in HBase. I.e. 
look at the state of the data at any point in the past, provided the data is 
still around.
bq.  This did not work for deletes, however. Deletes would always mask all puts 
in the past.
bq.  This change adds a flag that can be on HColumnDescriptor to enable 
retention of deleted rows.
bq.  These rows are still subject to TTL and/or VERSIONS.
bq.  
bq.  This changes the following:
bq.  1. There is a new flag on HColumnDescriptor enabling that behavior.
bq.  2. Allow gets/scans with a timerange to retrieve rows hidden by a delete 
marker, if the timerange does not include the delete marker.
bq.  3. Do not unconditionally collect all deleted rows during a compaction.
bq.  4. Allow a raw Scan, which retrieves all delete markers and deleted rows.
bq.  
bq.  The change is small'ish, but the logic is intricate, so please review 
carefully.
bq.  
bq.  
bq.  This addresses bug HBASE-4536.
bq.  https://issues.apache.org/jira/browse/HBASE-4536
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Attributes.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1185362 
bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeepDeletes.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
 1185362 
bq.

[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141971#comment-13141971
 ] 

nkeywal commented on HBASE-4724:


I reproduce the issue 100% of the time on trunk, with 
TestAdmin#testCreateBadTables. I was not reproducing the error at all 
previously.

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Priority: Critical

 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)

[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13141979#comment-13141979
 ] 

nkeywal commented on HBASE-4724:


I believe that this log:
{noformat}
org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
server = 29)
[...]
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1150)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1145)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1821)
at 
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:522)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:468)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309)
at java.lang.Thread.run(Thread.java:662)
{noformat}

could explain why we have this in the final threads dump:

{noformat}
Thread 149 (Master:0;localhost,36968,1320216715828):
  State: WAITING
  Blocked count: 148
  Waited count: 148
  Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@1d4f0fb4
  Stack:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)

org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)

org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)

org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)
org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:523)

org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:468)
org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:309)
java.lang.Thread.run(Thread.java:662)
{noformat}

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Priority: Critical

 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on

[jira] [Commented] (HBASE-4577) Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB

2011-11-02 Thread gaojinchao (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142008#comment-13142008
 ] 

gaojinchao commented on HBASE-4577:
---

Test failed, it seems not a patch problem.

 Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB
 -

 Key: HBASE-4577
 URL: https://issues.apache.org/jira/browse/HBASE-4577
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4577_trial_Trunk.patch, HBASE-4577_trunk.patch


 Minor issue while looking at the RS metrics:
 bq. numberOfStorefiles=8, storefileUncompressedSizeMB=2418, 
 storefileSizeMB=2420, compressionRatio=1.0008
 I guess there's a truncation somewhere when it's adding the numbers up.
 FWIW there's no compression on that table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4577) Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB


[ 
https://issues.apache.org/jira/browse/HBASE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142007#comment-13142007
 ] 

Ted Yu commented on HBASE-4577:
---

I don't see 'Too many open files' for 
https://builds.apache.org/job/PreCommit-HBASE-Build/133//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoadWithSplit/

 Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB
 -

 Key: HBASE-4577
 URL: https://issues.apache.org/jira/browse/HBASE-4577
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4577_trial_Trunk.patch, HBASE-4577_trunk.patch


 Minor issue while looking at the RS metrics:
 bq. numberOfStorefiles=8, storefileUncompressedSizeMB=2418, 
 storefileSizeMB=2420, compressionRatio=1.0008
 I guess there's a truncation somewhere when it's adding the numbers up.
 FWIW there's no compression on that table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk

2011-11-02 Thread Lucian George Iordache (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4724:
---

Attachment: 2002_4724_TestAdmin.patch

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Priority: Critical
 Attachments: 2002_4724_TestAdmin.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)
 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)
 org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:523)

[jira] [Updated] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation


 [ 
https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucian George Iordache updated HBASE-4713:
--

Affects Version/s: 0.90.4
   Status: Patch Available  (was: Open)

 Raise debug level to warn on ExecutionException in 
 HConnectionManager$HConnectionImplementation
 ---

 Key: HBASE-4713
 URL: https://issues.apache.org/jira/browse/HBASE-4713
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Lucian George Iordache
 Attachments: HBASE-4713-patch.txt


 The ExecutionException is logged on debug level, and it should be logged on 
 warn. I've met the problem in the next case:
 - hbase.rpc.timeout = 6
 - lease time on region server = 24
 - started a scan that takes more than 60 seconds on the region server == 
 SocketTimeoutException logged on debug
 Having the log level on info, the exception was not observable on the client 
 side and it took me a while to figure out what was hapenning.
 See also:
 - https://issues.apache.org/jira/browse/HBASE-3154
 - 
 http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCANH3+J0athaCjK-ahu-A=hrzoosjyh6s_mtpzm3_qqpfrcs...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk


 [ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4724:
---

Assignee: nkeywal
  Status: Patch Available  (was: Open)

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Attachments: 2002_4724_TestAdmin.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)
 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)

[jira] [Commented] (HBASE-4577) Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB

2011-11-02 Thread gaojinchao (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142012#comment-13142012
 ] 

gaojinchao commented on HBASE-4577:
---

My local test result:

Running org.apache.hadoop.hbase.TestMultiVersions
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 39.045 sec

Results :

Failed tests:   testHBaseFsck(org.apache.hadoop.hbase.util.TestHBaseFsck): 
expected:0 but was:1

Tests in error:
  
testMasterFailoverWithMockedRITOnDeadRS(org.apache.hadoop.hbase.master.TestMasterFailover):
 test timed out after 18 milliseconds
  
testEnableTableRoundRobinAssignment(org.apache.hadoop.hbase.client.TestAdmin): 
org.apache.hadoop.hbase.TableNotEnabledException: testEnableTableAssignment
  
testBadOriginalRootLocation(org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster):
 unknown host: example.org

Tests run: 1073, Failures: 1, Errors: 3, Skipped: 9



 Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB
 -

 Key: HBASE-4577
 URL: https://issues.apache.org/jira/browse/HBASE-4577
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4577_trial_Trunk.patch, HBASE-4577_trunk.patch


 Minor issue while looking at the RS metrics:
 bq. numberOfStorefiles=8, storefileUncompressedSizeMB=2418, 
 storefileSizeMB=2420, compressionRatio=1.0008
 I guess there's a truncation somewhere when it's adding the numbers up.
 FWIW there's no compression on that table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142011#comment-13142011
 ] 

nkeywal commented on HBASE-4724:


I don't see the same stuff in the global build. I am gonna give a try by 
removing the modifications linked to the wal in admin.

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Attachments: 2002_4724_TestAdmin.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)

[jira] [Commented] (HBASE-4577) Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB


[ 
https://issues.apache.org/jira/browse/HBASE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142015#comment-13142015
 ] 

Ted Yu commented on HBASE-4577:
---

@Jinchao:
Please start with test output of 
TestHFileOutputFormat#testMRIncrementalLoadWithSplit and see why the test 
failed.

 Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB
 -

 Key: HBASE-4577
 URL: https://issues.apache.org/jira/browse/HBASE-4577
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4577_trial_Trunk.patch, HBASE-4577_trunk.patch


 Minor issue while looking at the RS metrics:
 bq. numberOfStorefiles=8, storefileUncompressedSizeMB=2418, 
 storefileSizeMB=2420, compressionRatio=1.0008
 I guess there's a truncation somewhere when it's adding the numbers up.
 FWIW there's no compression on that table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142013#comment-13142013
 ] 

Hadoop QA commented on HBASE-4724:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12501929/2002_4724_TestAdmin.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/134//console

This message is automatically generated.

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Attachments: 2002_4724_TestAdmin.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on

[jira] [Commented] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation

[
https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142016#comment-13142016
]

Hadoop QA commented on HBASE-4713:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12501928/HBASE-4713-patch.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

-1 patch. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/135//console

This message is automatically generated.

Raise debug level to warn on ExecutionException in
HConnectionManager$HConnectionImplementation
---

Key: HBASE-4713
URL: https://issues.apache.org/jira/browse/HBASE-4713
Project: HBase
Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Lucian George Iordache
Attachments: HBASE-4713-patch.txt

The ExecutionException is logged on debug level, and it should be logged on
warn. I've met the problem in the next case:
- hbase.rpc.timeout = 6
- lease time on region server = 24
- started a scan that takes more than 60 seconds on the region server ==
SocketTimeoutException logged on debug
Having the log level on info, the exception was not observable on the client
side and it took me a while to figure out what was hapenning.
See also:
- https://issues.apache.org/jira/browse/HBASE-3154
-
http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCANH3+J0athaCjK-ahu-A=hrzoosjyh6s_mtpzm3_qqpfrcs...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk


 [ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4724:
---

Status: Open  (was: Patch Available)

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)
 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)

[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk


 [ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4724:
---

Status: Patch Available  (was: Open)

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)
 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)

[jira] [Updated] (HBASE-4724) TestAdmin hangs randomly in trunk


 [ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-4724:
---

Attachment: 2002_4724_TestAdmin.v2.patch

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)
 
 org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRoot(CatalogTracker.java:277)

[jira] [Commented] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation


[ 
https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142022#comment-13142022
 ] 

Ted Yu commented on HBASE-4713:
---

@Lucian:
Your attachment is a diff file which is not recognized by HadoopQA.
Can you generate a patch ?

See http://wiki.apache.org/hadoop/Hbase/HowToContribute
Once you check out source code and make the above modification, you can use 
'svn diff  4713.patch' to obtain patch.

 Raise debug level to warn on ExecutionException in 
 HConnectionManager$HConnectionImplementation
 ---

 Key: HBASE-4713
 URL: https://issues.apache.org/jira/browse/HBASE-4713
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Lucian George Iordache
 Attachments: HBASE-4713-patch.txt


 The ExecutionException is logged on debug level, and it should be logged on 
 warn. I've met the problem in the next case:
 - hbase.rpc.timeout = 6
 - lease time on region server = 24
 - started a scan that takes more than 60 seconds on the region server == 
 SocketTimeoutException logged on debug
 Having the log level on info, the exception was not observable on the client 
 side and it took me a while to figure out what was hapenning.
 See also:
 - https://issues.apache.org/jira/browse/HBASE-3154
 - 
 http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCANH3+J0athaCjK-ahu-A=hrzoosjyh6s_mtpzm3_qqpfrcs...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4716) Improve locking for single column family bulk load


[ 
https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142026#comment-13142026
 ] 

Ted Yu commented on HBASE-4716:
---

closeBulkRegionOperation() is at the beginning of finally block.

For the alternate code path, we only take one lock. The lock would be released 
in closeBulkRegionOperation() accordingly.

 Improve locking for single column family bulk load
 --

 Key: HBASE-4716
 URL: https://issues.apache.org/jira/browse/HBASE-4716
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4716-v2.txt, 4716.txt


 HBASE-4552 changed the locking behavior for single column family bulk load, 
 namely we don't need to take write lock.
 A read lock would suffice in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.


[ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142033#comment-13142033
 ] 

Hudson commented on HBASE-1744:
---

Integrated in HBase-TRUNK #2399 (See 
[https://builds.apache.org/job/HBase-TRUNK/2399/])
HBASE-1744  Thrift server to match the new java api (Tim Sell)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/bin/hbase
* /hbase/trunk/src/examples/thrift2
* /hbase/trunk/src/examples/thrift2/DemoClient.java
* /hbase/trunk/src/examples/thrift2/DemoClient.py
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/ThriftHBaseServiceHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/ThriftServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/ThriftUtilities.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumn.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumnIncrement.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumnValue.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TDelete.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TDeleteType.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TGet.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/THBaseService.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TIOError.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TIllegalArgument.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TIncrement.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TPut.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TResult.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TScan.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TTimeRange.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/thrift2/package.html
* /hbase/trunk/src/main/resources/org/apache/hadoop/hbase/thrift2
* /hbase/trunk/src/main/resources/org/apache/hadoop/hbase/thrift2/hbase.thrift
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/thrift2
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/thrift2/TestThriftHBaseServiceHandler.java


 Thrift server to match the new java api.
 

 Key: HBASE-1744
 URL: https://issues.apache.org/jira/browse/HBASE-1744
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Tim Sell
Assignee: Tim Sell
Priority: Critical
 Fix For: 0.94.0

 Attachments: 
 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
 HBASE-1744.11.patch, HBASE-1744.2.patch, HBASE-1744.3.patch, 
 HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, 
 HBASE-1744.7.patch, HBASE-1744.8.patch, HBASE-1744.9.patch, 
 HBASE-1744.preview.1.patch, thriftexperiment.patch


 This mutateRows, etc.. is a little confusing compared to the new cleaner java 
 client.
 Thinking of ways to make a thrift client that is just as elegant. something 
 like:
 void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
 with:
 struct TColumn {
   1:Bytes family,
   2:Bytes qualifier,
   3:i64 timestamp
 }
 struct TPut {
   1:Bytes row,
   2:mapTColumn, Bytes values
 }
 This creates more verbose rpc  than if the columns in TPut were just 
 mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and 
 still be intuitive from say python.
 Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4722) TestGlobalMemStoreSize has started failing


[ 
https://issues.apache.org/jira/browse/HBASE-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142034#comment-13142034
 ] 

Hudson commented on HBASE-4722:
---

Integrated in HBase-TRUNK #2399 (See 
[https://builds.apache.org/job/HBase-TRUNK/2399/])
HBASE-4722 TestGlobalMemStoreSize has started failing; commit some extra 
logging to help debug whats going on up on jenkins

stack : 
Files : 
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java


 TestGlobalMemStoreSize has started failing
 --

 Key: HBASE-4722
 URL: https://issues.apache.org/jira/browse/HBASE-4722
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical
 Attachments: logging-v2.txt, logging.txt


 I'm digging in.  It fails occasionally for me locally to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.


[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142035#comment-13142035
 ] 

Ted Yu commented on HBASE-4377:
---

Integrated to 0.90, 0.92 and TRUNK.

Thanks for the patch Jonathan.

 [hbck] Offline rebuild .META. from fs data only.
 

 Key: HBASE-4377
 URL: https://issues.apache.org/jira/browse/HBASE-4377
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 
 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90-v4.patch, 
 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90.v3.patch, 
 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, 
 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.trunk.v3.patch, 
 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v1.patch, 
 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v2.patch, 
 EXT_AC.regioninfo, EXT_ATU_05f84d32cbc0bdabf00e00bc2f3570f0.regioninfo, 
 hbase-4377-trunk.v2.patch, hbase-4377.0.90.v6.patch, hbase-4377.trunk.v3.txt, 
 hbase-4377.trunk.v4.txt, hbase-4377.trunk.v5.txt, hbase-4377.trunk.v6.patch


 In a worst case situation, it may be helpful to have an offline .META. 
 rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
 from scratch.  Users could move bad regions out until there is a clean 
 rebuild.  
 It would likely fill in region split holes.  Follow on work could given 
 options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.

2011-11-02 Thread Lucian George Iordache (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-1744:
--

Attachment: 1744.addendum

Addendum that makes TestThriftHBaseServiceHandler immune to hanging minicluster.

 Thrift server to match the new java api.
 

 Key: HBASE-1744
 URL: https://issues.apache.org/jira/browse/HBASE-1744
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Tim Sell
Assignee: Tim Sell
Priority: Critical
 Fix For: 0.94.0

 Attachments: 
 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
 1744.addendum, HBASE-1744.11.patch, HBASE-1744.2.patch, HBASE-1744.3.patch, 
 HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, 
 HBASE-1744.7.patch, HBASE-1744.8.patch, HBASE-1744.9.patch, 
 HBASE-1744.preview.1.patch, thriftexperiment.patch


 This mutateRows, etc.. is a little confusing compared to the new cleaner java 
 client.
 Thinking of ways to make a thrift client that is just as elegant. something 
 like:
 void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
 with:
 struct TColumn {
   1:Bytes family,
   2:Bytes qualifier,
   3:i64 timestamp
 }
 struct TPut {
   1:Bytes row,
   2:mapTColumn, Bytes values
 }
 This creates more verbose rpc  than if the columns in TPut were just 
 mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and 
 still be intuitive from say python.
 Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation


 [ 
https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucian George Iordache updated HBASE-4713:
--

Status: Open  (was: Patch Available)

 Raise debug level to warn on ExecutionException in 
 HConnectionManager$HConnectionImplementation
 ---

 Key: HBASE-4713
 URL: https://issues.apache.org/jira/browse/HBASE-4713
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Lucian George Iordache
 Attachments: HBASE-4713-patch.txt


 The ExecutionException is logged on debug level, and it should be logged on 
 warn. I've met the problem in the next case:
 - hbase.rpc.timeout = 6
 - lease time on region server = 24
 - started a scan that takes more than 60 seconds on the region server == 
 SocketTimeoutException logged on debug
 Having the log level on info, the exception was not observable on the client 
 side and it took me a while to figure out what was hapenning.
 See also:
 - https://issues.apache.org/jira/browse/HBASE-3154
 - 
 http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCANH3+J0athaCjK-ahu-A=hrzoosjyh6s_mtpzm3_qqpfrcs...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation

2011-11-02 Thread Lucian George Iordache (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucian George Iordache updated HBASE-4713:
--

Status: Patch Available  (was: Open)

Try 4713.patch

 Raise debug level to warn on ExecutionException in 
 HConnectionManager$HConnectionImplementation
 ---

 Key: HBASE-4713
 URL: https://issues.apache.org/jira/browse/HBASE-4713
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Lucian George Iordache
 Attachments: 4713.patch, HBASE-4713-patch.txt


 The ExecutionException is logged on debug level, and it should be logged on 
 warn. I've met the problem in the next case:
 - hbase.rpc.timeout = 6
 - lease time on region server = 24
 - started a scan that takes more than 60 seconds on the region server == 
 SocketTimeoutException logged on debug
 Having the log level on info, the exception was not observable on the client 
 side and it took me a while to figure out what was hapenning.
 See also:
 - https://issues.apache.org/jira/browse/HBASE-3154
 - 
 http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCANH3+J0athaCjK-ahu-A=hrzoosjyh6s_mtpzm3_qqpfrcs...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation

2011-11-02 Thread Lucian George Iordache (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucian George Iordache updated HBASE-4713:
--

Attachment: 4713.patch

 Raise debug level to warn on ExecutionException in 
 HConnectionManager$HConnectionImplementation
 ---

 Key: HBASE-4713
 URL: https://issues.apache.org/jira/browse/HBASE-4713
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Lucian George Iordache
 Attachments: 4713.patch, HBASE-4713-patch.txt


 The ExecutionException is logged on debug level, and it should be logged on 
 warn. I've met the problem in the next case:
 - hbase.rpc.timeout = 6
 - lease time on region server = 24
 - started a scan that takes more than 60 seconds on the region server == 
 SocketTimeoutException logged on debug
 Having the log level on info, the exception was not observable on the client 
 side and it took me a while to figure out what was hapenning.
 See also:
 - https://issues.apache.org/jira/browse/HBASE-3154
 - 
 http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCANH3+J0athaCjK-ahu-A=hrzoosjyh6s_mtpzm3_qqpfrcs...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.


[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142071#comment-13142071
 ] 

Hudson commented on HBASE-4377:
---

Integrated in HBase-0.92 #98 (See 
[https://builds.apache.org/job/HBase-0.92/98/])
HBASE-4377  [hbck] Offline rebuild .META. from fs data only
   (Jonathan Hsieh)
HBASE-4377  [hbck] Offline rebuild .META. from fs data only
   (Jonathan Hsieh) (detail)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/HbckTestingUtil.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildBase.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildHole.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/util/hbck/TestOfflineMetaRebuildOverlap.java


 [hbck] Offline rebuild .META. from fs data only.
 

 Key: HBASE-4377
 URL: https://issues.apache.org/jira/browse/HBASE-4377
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 
 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90-v4.patch, 
 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.0.90.v3.patch, 
 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.patch, 
 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data-.trunk.v3.patch, 
 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v1.patch, 
 0001-HBASE-4377-hbck-Offline-rebuild-.META.-from-fs-data.0.92.v2.patch, 
 EXT_AC.regioninfo, EXT_ATU_05f84d32cbc0bdabf00e00bc2f3570f0.regioninfo, 
 hbase-4377-trunk.v2.patch, hbase-4377.0.90.v6.patch, hbase-4377.trunk.v3.txt, 
 hbase-4377.trunk.v4.txt, hbase-4377.trunk.v5.txt, hbase-4377.trunk.v6.patch


 In a worst case situation, it may be helpful to have an offline .META. 
 rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
 from scratch.  Users could move bad regions out until there is a clean 
 rebuild.  
 It would likely fill in region split holes.  Follow on work could given 
 options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3601) TestMasterFailover broken in TRUNK


 [ 
https://issues.apache.org/jira/browse/HBASE-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-3601:
--

Fix Version/s: 0.92.0

 TestMasterFailover broken in TRUNK
 --

 Key: HBASE-3601
 URL: https://issues.apache.org/jira/browse/HBASE-3601
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.92.0


 After HBASE-3573, went in, TestMasterFailover broke.  The change in shutdown 
 technique revealed an issue with our in-memory accounting when a master joins 
 an already cluster; we don't add .META. and -ROOT- to our set of online 
 regions in the new master so could make for some interesting issues as the 
 new master progressed (Previous shutdown did a count of remaining servers, 
 new shutdown process looks at in-memory state to see if only catalog carrying 
 regionservers online... this is what was going out of whack in new master).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3605) Fix balancer log message


 [ 
https://issues.apache.org/jira/browse/HBASE-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-3605:
--

Fix Version/s: 0.92.0

 Fix balancer log message
 

 Key: HBASE-3605
 URL: https://issues.apache.org/jira/browse/HBASE-3605
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.92.0


 From Gaojinchao up on user list:
 In balanceCluster function , It should be leastloaded= + 
 serversByLoad.firstKey ().getLoad().getNumberOfRegions())
 {code}
 if(serversByLoad.lastKey().getLoad().getNumberOfRegions() = max 
   serversByLoad.firstKey().getLoad().getNumberOfRegions() = min) {
  // Skipped because no server outside (min,max) range
  LOG.info(Skipping load balancing.  servers= + numServers +   +
  regions= + numRegions +  average= + average +   +
  mostloaded= + 
 serversByLoad.lastKey().getLoad().getNumberOfRegions() +
   leastloaded= + 
 serversByLoad.lastKey().getLoad().getNumberOfRegions());
  return null;
}
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation

[
https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142119#comment-13142119
]

Hadoop QA commented on HBASE-4713:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12501940/4713.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 javadoc. The javadoc tool appears to have generated -165 warning
messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 42 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.master.TestDistributedLogSplitting
org.apache.hadoop.hbase.master.TestMasterFailover

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/138//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/138//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/138//console

This message is automatically generated.

Raise debug level to warn on ExecutionException in
HConnectionManager$HConnectionImplementation
---

[jira] [Updated] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation


 [ 
https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4713:
-

   Resolution: Fixed
Fix Version/s: 0.92.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thank you for the patch Lucian George Iordache.  I applied trunk and 0.92 
branch..

 Raise debug level to warn on ExecutionException in 
 HConnectionManager$HConnectionImplementation
 ---

 Key: HBASE-4713
 URL: https://issues.apache.org/jira/browse/HBASE-4713
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Lucian George Iordache
 Fix For: 0.92.0

 Attachments: 4713.patch, HBASE-4713-patch.txt


 The ExecutionException is logged on debug level, and it should be logged on 
 warn. I've met the problem in the next case:
 - hbase.rpc.timeout = 6
 - lease time on region server = 24
 - started a scan that takes more than 60 seconds on the region server == 
 SocketTimeoutException logged on debug
 Having the log level on info, the exception was not observable on the client 
 side and it took me a while to figure out what was hapenning.
 See also:
 - https://issues.apache.org/jira/browse/HBASE-3154
 - 
 http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCANH3+J0athaCjK-ahu-A=hrzoosjyh6s_mtpzm3_qqpfrcs...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4716) Improve locking for single column family bulk load


[ 
https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142172#comment-13142172
 ] 

stack commented on HBASE-4716:
--

Thanks. +1 on commit.

 Improve locking for single column family bulk load
 --

 Key: HBASE-4716
 URL: https://issues.apache.org/jira/browse/HBASE-4716
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4716-v2.txt, 4716.txt


 HBASE-4552 changed the locking behavior for single column family bulk load, 
 namely we don't need to take write lock.
 A read lock would suffice in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4609) ThriftServer.getRegionInfo() is expecting old ServerName format, need to use new Addressing class instead


 [ 
https://issues.apache.org/jira/browse/HBASE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4609:
--

Status: Open  (was: Patch Available)

 ThriftServer.getRegionInfo() is expecting old ServerName format, need to use 
 new Addressing class instead
 -

 Key: HBASE-4609
 URL: https://issues.apache.org/jira/browse/HBASE-4609
 Project: HBase
  Issue Type: Bug
  Components: thrift
Affects Versions: 0.92.0, 0.94.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4609-v1.patch


 ThriftServer.getRegionInfo() is expecting the old ServerName that doesn't 
 include start code.  Need to fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4577) Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB

2011-11-02 Thread gaojinchao (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142187#comment-13142187
 ] 

gaojinchao commented on HBASE-4577:
---

Sorry, I am not familiar with MR. Continue to dig this issue tomorrow.

 Region server reports storefileSizeMB bigger than storefileUncompressedSizeMB
 -

 Key: HBASE-4577
 URL: https://issues.apache.org/jira/browse/HBASE-4577
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4577_trial_Trunk.patch, HBASE-4577_trunk.patch


 Minor issue while looking at the RS metrics:
 bq. numberOfStorefiles=8, storefileUncompressedSizeMB=2418, 
 storefileSizeMB=2420, compressionRatio=1.0008
 I guess there's a truncation somewhere when it's adding the numbers up.
 FWIW there's no compression on that table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4609) ThriftServer.getRegionInfo() is expecting old ServerName format, need to use new Addressing class instead


 [ 
https://issues.apache.org/jira/browse/HBASE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4609:
--

Attachment: 4609-v2.txt

 ThriftServer.getRegionInfo() is expecting old ServerName format, need to use 
 new Addressing class instead
 -

 Key: HBASE-4609
 URL: https://issues.apache.org/jira/browse/HBASE-4609
 Project: HBase
  Issue Type: Bug
  Components: thrift
Affects Versions: 0.92.0, 0.94.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0

 Attachments: 4609-v2.txt, HBASE-4609-v1.patch


 ThriftServer.getRegionInfo() is expecting the old ServerName that doesn't 
 include start code.  Need to fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4609) ThriftServer.getRegionInfo() is expecting old ServerName format, need to use new Addressing class instead


 [ 
https://issues.apache.org/jira/browse/HBASE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4609:
--

Status: Patch Available  (was: Open)

 ThriftServer.getRegionInfo() is expecting old ServerName format, need to use 
 new Addressing class instead
 -

 Key: HBASE-4609
 URL: https://issues.apache.org/jira/browse/HBASE-4609
 Project: HBase
  Issue Type: Bug
  Components: thrift
Affects Versions: 0.92.0, 0.94.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
 Fix For: 0.92.0

 Attachments: 4609-v2.txt, HBASE-4609-v1.patch


 ThriftServer.getRegionInfo() is expecting the old ServerName that doesn't 
 include start code.  Need to fix.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.


[ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142197#comment-13142197
 ] 

Ted Yu commented on HBASE-1744:
---

From 
https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/2399/artifact/trunk/target/surefire-reports/org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandler-output.txt:
{code}
2011-11-02 09:18:25,930 INFO  [main] zookeeper.MiniZooKeeperCluster(141): 
Failed binding ZK Server to client port: 21818
2011-11-02 09:18:25,958 INFO  [main] zookeeper.MiniZooKeeperCluster(164): 
Started MiniZK Cluster and connect 1 ZK server on client port: 21819
{code}
Let's see what happens in build 2400.

 Thrift server to match the new java api.
 

 Key: HBASE-1744
 URL: https://issues.apache.org/jira/browse/HBASE-1744
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Tim Sell
Assignee: Tim Sell
Priority: Critical
 Fix For: 0.94.0

 Attachments: 
 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
 1744.addendum, HBASE-1744.11.patch, HBASE-1744.2.patch, HBASE-1744.3.patch, 
 HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, 
 HBASE-1744.7.patch, HBASE-1744.8.patch, HBASE-1744.9.patch, 
 HBASE-1744.preview.1.patch, thriftexperiment.patch


 This mutateRows, etc.. is a little confusing compared to the new cleaner java 
 client.
 Thinking of ways to make a thrift client that is just as elegant. something 
 like:
 void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
 with:
 struct TColumn {
   1:Bytes family,
   2:Bytes qualifier,
   3:i64 timestamp
 }
 struct TPut {
   1:Bytes row,
   2:mapTColumn, Bytes values
 }
 This creates more verbose rpc  than if the columns in TPut were just 
 mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and 
 still be intuitive from say python.
 Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.


[ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142209#comment-13142209
 ] 

Hudson commented on HBASE-1744:
---

Integrated in HBase-TRUNK #2400 (See 
[https://builds.apache.org/job/HBase-TRUNK/2400/])
HBASE-1744 HBaseAdmin ctor should obtain Configuration from 
HBaseTestingUtility

tedyu : 
Files : 
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/thrift2/TestThriftHBaseServiceHandler.java


 Thrift server to match the new java api.
 

 Key: HBASE-1744
 URL: https://issues.apache.org/jira/browse/HBASE-1744
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Tim Sell
Assignee: Tim Sell
Priority: Critical
 Fix For: 0.94.0

 Attachments: 
 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
 1744.addendum, HBASE-1744.11.patch, HBASE-1744.2.patch, HBASE-1744.3.patch, 
 HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, 
 HBASE-1744.7.patch, HBASE-1744.8.patch, HBASE-1744.9.patch, 
 HBASE-1744.preview.1.patch, thriftexperiment.patch


 This mutateRows, etc.. is a little confusing compared to the new cleaner java 
 client.
 Thinking of ways to make a thrift client that is just as elegant. something 
 like:
 void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
 with:
 struct TColumn {
   1:Bytes family,
   2:Bytes qualifier,
   3:i64 timestamp
 }
 struct TPut {
   1:Bytes row,
   2:mapTColumn, Bytes values
 }
 This creates more verbose rpc  than if the columns in TPut were just 
 mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and 
 still be intuitive from say python.
 Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142215#comment-13142215
 ] 

stack commented on HBASE-4724:
--

I applied your restore of wal behavior in v2.  Lets see how it plays out on 
jenkins.

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Fix For: 0.92.0

 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)

[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142217#comment-13142217
 ] 

stack commented on HBASE-4724:
--

Oh, you need to do --no-prefix when making patches for patch-build.

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Fix For: 0.92.0

 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)

[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142229#comment-13142229
 ] 

stack commented on HBASE-4724:
--

@N I don't have version mismatch when I run TestAdmin.  It doesn't fail for me 
either.

bq. Adding a maximum retry to Threads#threadDumpingIsAlive could help.

Yes.  We should do this.  No point in going on after we've thread dumped three 
times I'd say.

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Fix For: 0.92.0

 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)

[jira] [Commented] (HBASE-4583) Integrate RWCC with Append and Increment operations

[
https://issues.apache.org/jira/browse/HBASE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142236#comment-13142236
]

stack commented on HBASE-4583:
--

bq. There are various ways to produce serializable schedules (pessimistic
locking, optimistic locking with rechecking of pre conditions, snapshot
isolation, etc), all which will probably mean worse performance for both append
and increment.

Shouldn't we do it anyways (though big yuck on your list above -- it makes my
brain hurt just thinking on it. Can you imagine rechecking pre-conditions and
then replaying the failed transaction.. how much fun that'll be to code up!)?

Shouldn't we be correct first and then performant?

bq. As said above the current implementation sync's the WAL after the memstore
is updated and the new values are visible to other threads, and after the locks
are released.

Sounds broke to me; sounds like big compromise for sake of better perf. Should
we open new issue on this?

bq. (1) and (2) together mean that the WAL needs to be sync'ed with the row
lock held (which would be quite a performance degradation).

Shouldn't we ship with this config. with options to run hbase otherwise
(memstore put then sync, etc.)

bq. Now, what we could do is use rwcc to make the changes to the CFs atomic,
and still sync the WAL after all the locks are released (as we do now). With
this compromise everything would be correct unless the sync'ing of WAL fails

Sounds broke still?

Thanks for the write up and for digging in here fellas.

Integrate RWCC with Append and Increment operations
---

Key: HBASE-4583
URL: https://issues.apache.org/jira/browse/HBASE-4583
Project: HBase
Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Fix For: 0.94.0

Attachments: 4583-v2.txt, 4583-v3.txt, 4583-v4.txt, 4583.txt

Currently Increment and Append operations do not work with RWCC and hence a
client could see the results of multiple such operation mixed in the same
Get/Scan.
The semantics might be a bit more interesting here as upsert adds and removes
to and from the memstore.

[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142239#comment-13142239
 ] 

stack commented on HBASE-4724:
--

Before applying the patch, I got the below on random run:

{code}Tests in error: 
  testDisableAndEnableTable(org.apache.hadoop.hbase.client.TestAdmin): 
org.apache.hadoop.hbase.TableNotEnabledException: 
testDisableAndEnableTable{code}

Trying w/ your v2 patch now.

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Fix For: 0.92.0

 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)

[jira] [Commented] (HBASE-4480) Testing script to simplify local testing


[ 
https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142242#comment-13142242
 ] 

stack commented on HBASE-4480:
--

I like this script.  We should check it in under new 'dev-support' dir?  The 
usage is a bit off.  It says '-n=N' when I think it means to say '-n N'

 Testing script to simplify local testing
 

 Key: HBASE-4480
 URL: https://issues.apache.org/jira/browse/HBASE-4480
 Project: HBase
  Issue Type: Improvement
Reporter: Jesse Yates
Priority: Minor
  Labels: test
 Attachments: HBASE-4480.patch, HBASE-4480_v2.patch, 
 HBASE-4480_v3.patch, runtest-no-npe-check.sh, runtest.sh, runtest2.sh


 As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and 
 http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a 
 script that would handle more of the finer points of running/checking our 
 test suite.
 This script should:
 (1) Allow people to determine which tests are hanging/taking a long time to 
 run
 (2) Allow rerunning of particular tests to make sure it wasn't an artifact of 
 running the whole suite that caused the failure
 (3) Allow people to specify to run just unit tests or also integration tests 
 (essentially wrapping calls to 'maven test' and 'maven verify').
 This script should just be a convenience script - running tests directly from 
 maven should not be impacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4518) TestServerCustomProtocol is flaky


[ 
https://issues.apache.org/jira/browse/HBASE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142244#comment-13142244
 ] 

stack commented on HBASE-4518:
--

I shouldn't have said anything.  My mention of this issue caused the test to 
start failing up on jenkins again.

 TestServerCustomProtocol is flaky
 -

 Key: HBASE-4518
 URL: https://issues.apache.org/jira/browse/HBASE-4518
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.92.0
Reporter: Gary Helmling
 Fix For: 0.92.0

 Attachments: 
 org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol-output.txt, 
 org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.txt


 TestServerCustomProtocol has been showing some intermittent failures in 
 Jenkins due to what looks like region transitions.
 Here is the most recent failure:
 {noformat}
 Results :
 Failed tests:   
 testRowRange(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): 
 Results should contain region 
 test,bbb,1317332645939.aea9154349b9e0dc207e2e9476702763. for row 'bbb'
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142249#comment-13142249
 ] 

nkeywal commented on HBASE-4724:


I recloned the repo, and now it works all the time.

I have this is in the logs, it seems to be new to me:
{noformat}
2011-11-02 03:36:17,271 DEBUG [main] zookeeper.ZKUtil(1034): hconnection 
Retrieved 29 byte(s) of data from znode /hbase/root-region-server and set 
watcher; localhost,39688,1320229746489
2011-11-02 03:36:17,272 DEBUG [Finalizer] 
client.HConnectionManager$HConnectionImplementation(1715): The connection to 
null has been closed.
2011-11-02 03:36:17,272 DEBUG [Finalizer] 
client.HConnectionManager$HConnectionImplementation(1734): The connection to 
null was closed by the finalize method.
2011-11-02 03:36:17,272 DEBUG [Finalizer] 
client.HConnectionManager$HConnectionImplementation(1715): The connection to 
null has been closed.
2011-11-02 03:36:17,272 DEBUG [Finalizer] 
client.HConnectionManager$HConnectionImplementation(1734): The connection to 
null was closed by the finalize method.
{noformat}

Could it have a side effect in some cases?

FWIW, I also got this once, but the test case succeeded anyway. It something I 
saw in the past already.
{noformat}
2011-11-02 02:08:13,311 ERROR 
[MASTER_OPEN_REGION-localhost,40499,132022456-3] 
executor.EventHandler(171): Caught throwable while processing event 
RS_ZK_REGION_OPENED
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.master.AssignmentManager.updateTimers(AssignmentManager.java:1059)
at 
org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:1033)
at 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler.process(OpenedRegionHandler.java:105)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2011-11-02 02:08:13,311 INFO  [RS_OPEN_REGION-localhost,47967,132022999-2] 
regionserver.HRegion(402): Setting up tabledescriptor config now ...
{noformat}

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Fix For: 0.92.0

 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}

[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142251#comment-13142251
 ] 

stack commented on HBASE-4724:
--

Let me look at the above.

I reran the TestAdmin a few times and got this again:

{code}
---
Test set: org.apache.hadoop.hbase.client.TestAdmin
---
Tests run: 33, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 509.79 sec  
FAILURE!
testEnableDisableAddColumnDeleteColumn(org.apache.hadoop.hbase.client.TestAdmin)
  Time elapsed: 0.841 sec   ERROR!
org.apache.hadoop.hbase.TableNotEnabledException: 
org.apache.hadoop.hbase.TableNotEnabledException: testMasterAdmin
{code}

You think some of the cuts in timers too aggressive still?

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Fix For: 0.92.0

 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)

[jira] [Commented] (HBASE-4609) ThriftServer.getRegionInfo() is expecting old ServerName format, need to use new Addressing class instead

[
https://issues.apache.org/jira/browse/HBASE-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142255#comment-13142255
]

Hadoop QA commented on HBASE-4609:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12501968/4609-v2.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 javadoc. The javadoc tool appears to have generated -165 warning
messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 42 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.master.TestDistributedLogSplitting
org.apache.hadoop.hbase.master.TestMasterFailover

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/139//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/139//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/139//console

This message is automatically generated.

ThriftServer.getRegionInfo() is expecting old ServerName format, need to use
new Addressing class instead
-

Key: HBASE-4609
URL: https://issues.apache.org/jira/browse/HBASE-4609
Project: HBase
Issue Type: Bug
Components: thrift
Affects Versions: 0.92.0, 0.94.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Priority: Minor
Fix For: 0.92.0

Attachments: 4609-v2.txt, HBASE-4609-v1.patch

ThriftServer.getRegionInfo() is expecting the old ServerName that doesn't
include start code. Need to fix.

[jira] [Commented] (HBASE-4716) Improve locking for single column family bulk load


[ 
https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142257#comment-13142257
 ] 

Ted Yu commented on HBASE-4716:
---

Integrated to 0.92 and TRUNK.

Thanks for the review Todd and Stack.

 Improve locking for single column family bulk load
 --

 Key: HBASE-4716
 URL: https://issues.apache.org/jira/browse/HBASE-4716
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4716-v2.txt, 4716.txt


 HBASE-4552 changed the locking behavior for single column family bulk load, 
 namely we don't need to take write lock.
 A read lock would suffice in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4716) Improve locking for single column family bulk load


 [ 
https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4716:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Improve locking for single column family bulk load
 --

 Key: HBASE-4716
 URL: https://issues.apache.org/jira/browse/HBASE-4716
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4716-v2.txt, 4716.txt


 HBASE-4552 changed the locking behavior for single column family bulk load, 
 namely we don't need to take write lock.
 A read lock would suffice in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4725) NPE in AM#updateTimers

2011-11-02 Thread stack (Created) (JIRA)

NPE in AM#updateTimers
--

 Key: HBASE-4725
 URL: https://issues.apache.org/jira/browse/HBASE-4725
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.92.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4725) NPE in AM#updateTimers


[ 
https://issues.apache.org/jira/browse/HBASE-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142267#comment-13142267
 ] 

stack commented on HBASE-4725:
--

{code}
2011-11-02 02:08:13,311 ERROR 
[MASTER_OPEN_REGION-localhost,40499,132022456-3] 
executor.EventHandler(171): Caught throwable while processing event 
RS_ZK_REGION_OPENED
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.master.AssignmentManager.updateTimers(AssignmentManager.java:1059)
at 
org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:1033)
at 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler.process(OpenedRegionHandler.java:105)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:168)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2011-11-02 02:08:13,311 INFO  [RS_OPEN_REGION-localho
{code}

 NPE in AM#updateTimers
 --

 Key: HBASE-4725
 URL: https://issues.apache.org/jira/browse/HBASE-4725
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: am.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4725) NPE in AM#updateTimers


 [ 
https://issues.apache.org/jira/browse/HBASE-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4725:
-

Status: Patch Available  (was: Open)

 NPE in AM#updateTimers
 --

 Key: HBASE-4725
 URL: https://issues.apache.org/jira/browse/HBASE-4725
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: am.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4725) NPE in AM#updateTimers


 [ 
https://issues.apache.org/jira/browse/HBASE-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4725:
-

Attachment: am.txt

 NPE in AM#updateTimers
 --

 Key: HBASE-4725
 URL: https://issues.apache.org/jira/browse/HBASE-4725
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: am.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142269#comment-13142269
 ] 

stack commented on HBASE-4724:
--

I made HBASE-4725 for the NPE.  Looking at the null connection...

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Fix For: 0.92.0

 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:104)

[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142272#comment-13142272
 ] 

nkeywal commented on HBASE-4724:


The timer I changed on the admin are all with a condition; so instead of 
checking once per second they check 5 times per second. This should not change 
the final behaviour, and I took care of not changing the final timemout. 
testEnableDisableAddColumnDeleteColumn has not been directly impacted by 4703, 
there is no timer there. 

What's the line that's throwing this exception?


 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Fix For: 0.92.0

 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c

[jira] [Commented] (HBASE-4415) Add configuration script for setup HBase (hbase-setup-conf.sh)


[ 
https://issues.apache.org/jira/browse/HBASE-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142284#comment-13142284
 ] 

stack commented on HBASE-4415:
--

Ditto on what Ted asks above.

Why would we have this duplicated (and now behind) hbase-env.sh over in 
src/packages/templates/conf/hbase-env.sh?  Why would we not copy it from 
original location?

@Andrew I see this patch has:

{code}
+  property
+namehbase.master.kerberos.principal/name
+value${HBASE_M_K_PRINCIPAL}/value
+description/description
+  /property
{code}

We need more than that now?


 Add configuration script for setup HBase (hbase-setup-conf.sh)
 --

 Key: HBASE-4415
 URL: https://issues.apache.org/jira/browse/HBASE-4415
 Project: HBase
  Issue Type: New Feature
  Components: scripts
Affects Versions: 0.90.4, 0.92.0
 Environment: Java 6, Linux
Reporter: Eric Yang
Assignee: Eric Yang
 Fix For: 0.90.4, 0.92.0

 Attachments: HBASE-4415-1.patch, HBASE-4415-2.patch, 
 HBASE-4415-3.patch, HBASE-4415-4.patch, HBASE-4415-5.patch, 
 HBASE-4415-6.patch, HBASE-4415.patch


 The goal of this jura is to provide a installation script for configuring 
 HBase environment and configuration.  By using the same pattern of 
 *-setup-conf.sh for all Hadoop related projects.  For HBase, the usage of the 
 script looks like this:
 {noformat}
 usage: ./hbase-setup-conf.sh parameters
   Optional parameters:
 --hadoop-conf=/etc/hadoopSet Hadoop configuration directory 
 location
 --hadoop-home=/usr   Set Hadoop directory location
 --hadoop-namenode=localhost  Set Hadoop namenode hostname
 --hadoop-replication=3   Set HDFS replication
 --hbase-home=/usrSet HBase directory location
 --hbase-conf=/etc/hbase  Set HBase configuration 
 directory location
 --hbase-log=/var/log/hbase   Set HBase log directory location
 --hbase-pid=/var/run/hbase   Set HBase pid directory location
 --hbase-user=hbase   Set HBase user
 --java-home=/usr/java/defaultSet JAVA_HOME directory location
 --kerberos-realm=KERBEROS.EXAMPLE.COMSet Kerberos realm
 --kerberos-principal-id=_HOSTSet Kerberos principal ID 
 --keytab-dir=/etc/security/keytabs   Set keytab directory
 --regionservers=localhostSet regionservers hostnames
 --zookeeper-home=/usrSet ZooKeeper directory location
 --zookeeper-quorum=localhost Set ZooKeeper Quorum
 --zookeeper-snapshot=/var/lib/zookeeper  Set ZooKeeper snapshot location
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4523) dfs.support.append config should be present in the hadoop configs, we should remove them from hbase so the user is not confused when they see the config in 2 places


[ 
https://issues.apache.org/jira/browse/HBASE-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142285#comment-13142285
 ] 

stack commented on HBASE-4523:
--

This patch looks fine.

 dfs.support.append config should be present in the hadoop configs, we should 
 remove them from hbase so the user is not confused when they see the config 
 in 2 places
 

 Key: HBASE-4523
 URL: https://issues.apache.org/jira/browse/HBASE-4523
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4, 0.92.0
Reporter: Arpit Gupta
Assignee: Eric Yang
 Fix For: 0.90.4, 0.92.0

 Attachments: HBASE-4523.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4535) hbase-env.sh in hbase rpm does not set HBASE_CONF_DIR


[ 
https://issues.apache.org/jira/browse/HBASE-4535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142286#comment-13142286
 ] 

stack commented on HBASE-4535:
--

Patch looks fine.  Should this be done over in original hbase-env.sh?

 hbase-env.sh in hbase rpm does not set HBASE_CONF_DIR
 -

 Key: HBASE-4535
 URL: https://issues.apache.org/jira/browse/HBASE-4535
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.90.3
Reporter: Ramya Sunil
Assignee: Eric Yang
 Attachments: HBASE-4535.patch


 After a hbase rpm install, hbase-env.sh does not define HBASE_CONF_DIR. This 
 needs to be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4635) Remove dependency of java for rpm/deb packaging


[ 
https://issues.apache.org/jira/browse/HBASE-4635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142288#comment-13142288
 ] 

stack commented on HBASE-4635:
--

This has this change:

{code}
 . /etc/default/hadoop-env.sh
-. /etc/default/zookeeper-env.sh
{code}

... which is a little unrelated.  You are trying to make this hbase-env.sh same 
as the original?

 Remove dependency of java for rpm/deb packaging
 ---

 Key: HBASE-4635
 URL: https://issues.apache.org/jira/browse/HBASE-4635
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 0.92.0
 Environment: Java, Ubuntu, RHEL
Reporter: Eric Yang
Assignee: Eric Yang
 Attachments: HBASE-4635.patch


 Comment from HBASE-3606:
 Eric, it looks like hbase rpm spec file sets dependency on jdk. Can we remove 
 the jdk dependency ? As everyone will not be installing jdk through rpm.
 There are multiple ways to install Java on Linux.  It would be better to 
 remove Java dependency declaration for packaging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4518) TestServerCustomProtocol is flaky

2011-11-02 Thread Gary Helmling (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling updated HBASE-4518:
-

Attachment: HBASE-4518.patch

This patch cleans up the config of the PingHandler endpoint and changes the 
mini cluster to use a single region server to avoid region transition issues.  
With this patch I was able to run TestServerCustomProtocol in a batch of 50 
runs with no failures.

 TestServerCustomProtocol is flaky
 -

 Key: HBASE-4518
 URL: https://issues.apache.org/jira/browse/HBASE-4518
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.92.0
Reporter: Gary Helmling
 Fix For: 0.92.0

 Attachments: HBASE-4518.patch, 
 org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol-output.txt, 
 org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.txt


 TestServerCustomProtocol has been showing some intermittent failures in 
 Jenkins due to what looks like region transitions.
 Here is the most recent failure:
 {noformat}
 Results :
 Failed tests:   
 testRowRange(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): 
 Results should contain region 
 test,bbb,1317332645939.aea9154349b9e0dc207e2e9476702763. for row 'bbb'
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4518) TestServerCustomProtocol is flaky

2011-11-02 Thread Gary Helmling (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling updated HBASE-4518:
-

Assignee: Gary Helmling
  Status: Patch Available  (was: Open)

 TestServerCustomProtocol is flaky
 -

 Key: HBASE-4518
 URL: https://issues.apache.org/jira/browse/HBASE-4518
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.92.0
Reporter: Gary Helmling
Assignee: Gary Helmling
 Fix For: 0.92.0

 Attachments: HBASE-4518.patch, 
 org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol-output.txt, 
 org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.txt


 TestServerCustomProtocol has been showing some intermittent failures in 
 Jenkins due to what looks like region transitions.
 Here is the most recent failure:
 {noformat}
 Results :
 Failed tests:   
 testRowRange(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): 
 Results should contain region 
 test,bbb,1317332645939.aea9154349b9e0dc207e2e9476702763. for row 'bbb'
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4518) TestServerCustomProtocol is flaky


[ 
https://issues.apache.org/jira/browse/HBASE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142294#comment-13142294
 ] 

stack commented on HBASE-4518:
--

This line:

{code}
+util.getConfiguration().set(CoprocessorHost.REGION_COPROCESSOR_CONF_KEY,
+PingHandler.class.getName()); 
{code}

does what

{code}
-// TODO: use a test coprocessor for registration (once merged with CP code)
-// sleep here is an ugly hack to allow region transitions to finish
-Thread.sleep(5000);
-for (JVMClusterUtil.RegionServerThread t :
-  cluster.getRegionServerThreads()) {
-  for (HRegionInfo r : t.getRegionServer().getOnlineRegions()) {
-t.getRegionServer().getOnlineRegion(r.getRegionName())
-.registerProtocol(PingProtocol.class, new PingHandler());
-  }
-}  
{code}

... used to do?


If so, +1 on commit if patch-build gives back reasonable results.

 TestServerCustomProtocol is flaky
 -

 Key: HBASE-4518
 URL: https://issues.apache.org/jira/browse/HBASE-4518
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.92.0
Reporter: Gary Helmling
Assignee: Gary Helmling
 Fix For: 0.92.0

 Attachments: HBASE-4518.patch, 
 org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol-output.txt, 
 org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.txt


 TestServerCustomProtocol has been showing some intermittent failures in 
 Jenkins due to what looks like region transitions.
 Here is the most recent failure:
 {noformat}
 Results :
 Failed tests:   
 testRowRange(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): 
 Results should contain region 
 test,bbb,1317332645939.aea9154349b9e0dc207e2e9476702763. for row 'bbb'
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4726) RS should close region if it fails to mark it as 'OPENED'.

2011-11-02 Thread Madhuwanti Vaidya (Created) (JIRA)

RS should close region if it fails to mark it as 'OPENED'.
--

 Key: HBASE-4726
 URL: https://issues.apache.org/jira/browse/HBASE-4726
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.89.20100924
Reporter: Madhuwanti Vaidya
Assignee: Madhuwanti Vaidya
Priority: Minor


Currently if a RS fails to mark a region as 'OPENED' it only logs an error. It 
will leave the region open - this has caused duplicate region assignments in 
one of our production clusters. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142299#comment-13142299
 ] 

stack commented on HBASE-4724:
--

@N Trying to reproduce but testing the NPE fix at same time its not failing 
for me now I'll let it run longer.

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Fix For: 0.92.0

 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)
 java.lang.Object.wait(Object.java:485)
 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.blockUntilAvailable(ZooKeeperNodeTracker.java:131)

[jira] [Updated] (HBASE-4553) The update of .tableinfo is not atomic; we remove then rename


 [ 
https://issues.apache.org/jira/browse/HBASE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4553:
-

Attachment: 4553-v11.txt

Same as v10.

 The update of .tableinfo is not atomic; we remove then rename
 -

 Key: HBASE-4553
 URL: https://issues.apache.org/jira/browse/HBASE-4553
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 3446-v8.txt, 4553-v10.txt, 4553-v11.txt, 4553-v5.txt, 
 4553-v9.txt, HBase-4553-TestAvroServer.patch


 This comes of HBASE-4547.  The rename in 0.20 hdfs fails if file exists 
 already.  In 0.20+ its better but still 'some' issues if existing reader when 
 file is renamed.  This issue is about fixing this (though we depend on fix 
 first being in hdfs).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HBASE-4553) The update of .tableinfo is not atomic; we remove then rename

2011-11-02 Thread stack (Work started) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-4553 started by stack.

 The update of .tableinfo is not atomic; we remove then rename
 -

 Key: HBASE-4553
 URL: https://issues.apache.org/jira/browse/HBASE-4553
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 3446-v8.txt, 4553-v10.txt, 4553-v11.txt, 4553-v5.txt, 
 4553-v9.txt, HBase-4553-TestAvroServer.patch


 This comes of HBASE-4547.  The rename in 0.20 hdfs fails if file exists 
 already.  In 0.20+ its better but still 'some' issues if existing reader when 
 file is renamed.  This issue is about fixing this (though we depend on fix 
 first being in hdfs).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4553) The update of .tableinfo is not atomic; we remove then rename


 [ 
https://issues.apache.org/jira/browse/HBASE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4553:
-

Status: Open  (was: Patch Available)

 The update of .tableinfo is not atomic; we remove then rename
 -

 Key: HBASE-4553
 URL: https://issues.apache.org/jira/browse/HBASE-4553
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 3446-v8.txt, 4553-v10.txt, 4553-v11.txt, 4553-v5.txt, 
 4553-v9.txt, HBase-4553-TestAvroServer.patch


 This comes of HBASE-4547.  The rename in 0.20 hdfs fails if file exists 
 already.  In 0.20+ its better but still 'some' issues if existing reader when 
 file is renamed.  This issue is about fixing this (though we depend on fix 
 first being in hdfs).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4518) TestServerCustomProtocol is flaky

2011-11-02 Thread Gary Helmling (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142307#comment-13142307
 ] 

Gary Helmling commented on HBASE-4518:
--

@Stack,

Yes, for some reason in the original code I was being a bit too clever and 
registering the protocol handler directly.  Maybe it was before all the 
connecting bits had been filled in yet...  In any case, the manual registration 
in the original code would mean that the PingHandler would not get 
re-registered if a region closed on one RS and was reopened on another.  So 
that is a flaw.  And the test code should really be doing what we tell people 
to do with endpoints, which is to configure them as coprocessors.

 TestServerCustomProtocol is flaky
 -

 Key: HBASE-4518
 URL: https://issues.apache.org/jira/browse/HBASE-4518
 Project: HBase
  Issue Type: Bug
  Components: coprocessors, test
Affects Versions: 0.92.0
Reporter: Gary Helmling
Assignee: Gary Helmling
 Fix For: 0.92.0

 Attachments: HBASE-4518.patch, 
 org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol-output.txt, 
 org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.txt


 TestServerCustomProtocol has been showing some intermittent failures in 
 Jenkins due to what looks like region transitions.
 Here is the most recent failure:
 {noformat}
 Results :
 Failed tests:   
 testRowRange(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol): 
 Results should contain region 
 test,bbb,1317332645939.aea9154349b9e0dc207e2e9476702763. for row 'bbb'
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4719) HBase script assumes pre-Hadoop 0.21 layout of jar files

2011-11-02 Thread Roman Shaposhnik (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Shaposhnik updated HBASE-4719:


Status: Patch Available  (was: Open)

 HBase script assumes pre-Hadoop 0.21 layout of jar files
 

 Key: HBASE-4719
 URL: https://issues.apache.org/jira/browse/HBASE-4719
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.92.0
Reporter: Roman Shaposhnik
 Attachments: HBASE-4719.patch.txt


 The following in the bin/hbase:
 {noformat}
 HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls 
 ${HADOOP_HOME}/hadoop-core*.jar`)
 {noformat}
 assumes a pre-21 Hadoop layout. It'll be better to dynamically account for 
 either hadoop-core* or hadoop-common*, hadoop-hdfs*, hadoop-mapreduce*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4727) Don't unconditionally delete UNASSIGNED ZNode for a region.

2011-11-02 Thread Madhuwanti Vaidya (Created) (JIRA)

Don't unconditionally delete UNASSIGNED ZNode for a region.
---

 Key: HBASE-4727
 URL: https://issues.apache.org/jira/browse/HBASE-4727
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.89.20100924
Reporter: Madhuwanti Vaidya
Assignee: Madhuwanti Vaidya
Priority: Minor


Unconditionally deleting an UNASSIGNED ZNode when master processes 
RS2ZK_REGION_OPENED (from the toDo queue) for a region has caused multiply 
assigned regions or unassigned regions. One proposed fix is to check whether 
the ZNode is actually in the state RS2ZK_REGION_OPENED before deleting it. 
Another fix is to not delete the ZNode at all.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4719) HBase script assumes pre-Hadoop 0.21 layout of jar files

2011-11-02 Thread Roman Shaposhnik (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Shaposhnik updated HBASE-4719:


Attachment: HBASE-4719.patch.txt

 HBase script assumes pre-Hadoop 0.21 layout of jar files
 

 Key: HBASE-4719
 URL: https://issues.apache.org/jira/browse/HBASE-4719
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.92.0
Reporter: Roman Shaposhnik
 Attachments: HBASE-4719.patch.txt


 The following in the bin/hbase:
 {noformat}
 HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls 
 ${HADOOP_HOME}/hadoop-core*.jar`)
 {noformat}
 assumes a pre-21 Hadoop layout. It'll be better to dynamically account for 
 either hadoop-core* or hadoop-common*, hadoop-hdfs*, hadoop-mapreduce*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142308#comment-13142308
 ] 

nkeywal commented on HBASE-4724:


It happened once out of may be 20 tries.

For the null connection, there is at least a leak in 
{noformat}
  @Before
  public void setUp() throws Exception {
this.admin = new HBaseAdmin(TEST_UTIL.getConfiguration());
  }
{noformat}

But I guess it's not the only one, because I saw quite a lot of lines in the 
logs from Jenkins.

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Fix For: 0.92.0

 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1107)
 Thread 152 (Master:0;localhost,39664,1320187706355):
   State: WAITING
   Blocked count: 217
   Waited count: 174
   Waiting on org.apache.hadoop.hbase.zookeeper.RootRegionTracker@6621477c
   Stack:
 java.lang.Object.wait(Native Method)

[jira] [Updated] (HBASE-3716) Intermittent TestRegionRebalancing failure


 [ 
https://issues.apache.org/jira/browse/HBASE-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-3716:
--

Fix Version/s: 0.92.0

 Intermittent TestRegionRebalancing failure
 --

 Key: HBASE-3716
 URL: https://issues.apache.org/jira/browse/HBASE-3716
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.92.0

 Attachments: 3716-addendum.txt, 3716.txt


 See HBase-TRUNK build #1820
 This could be due to HBASE-3681
 In trunk, default value of hbase.regions.slop is 20%. It is possible for 
 load balancer to see region distribution which falls within 20% of optimal 
 distribution.
 However, assertRegionsAreBalanced() uses 10% slop.
 One solution is to align the slop in assertRegionsAreBalanced() with 
 hbase.regions.slop value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4728) Clean up noisy HBaseAdmin#close messages

2011-11-02 Thread stack (Created) (JIRA)

Clean up noisy HBaseAdmin#close messages


 Key: HBASE-4728
 URL: https://issues.apache.org/jira/browse/HBASE-4728
 Project: HBase
  Issue Type: Bug
Reporter: stack


See tail of HBASE-4724

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4724) TestAdmin hangs randomly in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-4724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142320#comment-13142320
 ] 

stack commented on HBASE-4724:
--

Ok.  Want to make new issue to fix the leak?

It just failed for me with this:

{code}
Tests in error: 
  testHundredsOfTable(org.apache.hadoop.hbase.client.TestAdmin): Call to 
sv4r11s38/10.4.11.38:46152 failed on socket timeout exception: 
java.net.SocketTimeoutException: 1500 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/10.4.11.38:52051 remote=sv4r11s38/10.4.11.38:46152]
{code}

Is 1500ms not enough? Did you change that?

On the message:

{code}

2011-11-02 03:36:17,272 DEBUG [Finalizer] 
client.HConnectionManager$HConnectionImplementation(1715): The connection to 
null has been closed.

{code}

... yeah, it looks like this test is making lots of instances of HBaseAdmin... 
ones it should be cleaning up but it does look like the message is harmless... 
a close of a closed connection.  I  made HBASE-4728 for this.

 TestAdmin hangs randomly in trunk
 -

 Key: HBASE-4724
 URL: https://issues.apache.org/jira/browse/HBASE-4724
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Critical
 Fix For: 0.92.0

 Attachments: 2002_4724_TestAdmin.patch, 
 2002_4724_TestAdmin.v2.patch


 fom the logs in my env
 {noformat}
 2011-11-01 15:48:40,744 WARN  [Master:0;localhost,39664,1320187706355] 
 master.AssignmentManager(1471): Failed assignment of -ROOT-,,0.70236052 to 
 localhost,44046,1320187706849, trying to assign elsewhere instead; retry=1
 org.apache.hadoop.hbase.ipc.HBaseRPC$VersionMismatch: Protocol 
 org.apache.hadoop.hbase.ipc.HRegionInterface version mismatch. (client = 28, 
 server = 29)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:185)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:300)
 {noformat}
 Anyway, after this the logs finishes with:
 {noformat}
 2011-11-01 15:54:35,132 INFO  
 [Master:0;localhost,39664,1320187706355.oldLogCleaner] hbase.Chore(80): 
 Master:0;localhost,39664,1320187706355.oldLogCleaner exiting
 Process Thread Dump: Automatic Stack Trace every 60 seconds waiting on 
 Master:0;localhost,39664,1320187706355
 {noformat}
 it's in
 {noformat}
 sun.management.ThreadImpl.getThreadInfo1(Native Method)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:156)
 sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:121)
 
 org.apache.hadoop.util.ReflectionUtils.printThreadInfo(ReflectionUtils.java:149)
 
 org.apache.hadoop.hbase.util.Threads.threadDumpingIsAlive(Threads.java:113)
 org.apache.hadoop.hbase.LocalHBaseCluster.join(LocalHBaseCluster.java:405)
 org.apache.hadoop.hbase.MiniHBaseCluster.join(MiniHBaseCluster.java:408)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniHBaseCluster(HBaseTestingUtility.java:616)
 
 org.apache.hadoop.hbase.HBaseTestingUtility.shutdownMiniCluster(HBaseTestingUtility.java:590)
 
 org.apache.hadoop.hbase.client.TestAdmin.tearDownAfterClass(TestAdmin.java:89)
 {noformat}
 So that's at least why adding a timeout wont help and may be why it does not 
 end at all. Adding a maximum retry to Threads#threadDumpingIsAlive could help.
 I also wonder if the root cause of the non ending is my modif on the wal, 
 with some threads surprised to have updates that were not written in the wal. 
 Here is the full stack dump:
 {noformat}
 Thread 354 (IPC Client (47) connection to localhost/127.0.0.1:52227 from 
 nkeywal):
   State: TIMED_WAITING
   Blocked count: 360
   Waited count: 359
   Stack:
 java.lang.Object.wait(Native Method)
 org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:702)
 org.apache.hadoop.ipc.Client$Connection.run(Client.java:744)
 Thread 272 (Master:0;localhost,39664,1320187706355-EventThread):
   State: WAITING
   Blocked count: 0
   Waited count: 4
   Waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@107b954b
   Stack:
 sun.misc.Unsafe.park(Native Method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
 Thread 271 
 (Master:0;localhost,39664,1320187706355-SendThread(localhost:21819)):
   State: RUNNABLE
   Blocked count: 2
   Waited count: 0
   Stack:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

[jira] [Commented] (HBASE-4719) HBase script assumes pre-Hadoop 0.21 layout of jar files


[ 
https://issues.apache.org/jira/browse/HBASE-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142322#comment-13142322
 ] 

stack commented on HBASE-4719:
--

Have you tried it on hbase trunk and then on an hbase with 0.21+ hadoop plugged 
in Roman?

 HBase script assumes pre-Hadoop 0.21 layout of jar files
 

 Key: HBASE-4719
 URL: https://issues.apache.org/jira/browse/HBASE-4719
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.92.0
Reporter: Roman Shaposhnik
 Attachments: HBASE-4719.patch.txt


 The following in the bin/hbase:
 {noformat}
 HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls 
 ${HADOOP_HOME}/hadoop-core*.jar`)
 {noformat}
 assumes a pre-21 Hadoop layout. It'll be better to dynamically account for 
 either hadoop-core* or hadoop-common*, hadoop-hdfs*, hadoop-mapreduce*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation


[ 
https://issues.apache.org/jira/browse/HBASE-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142323#comment-13142323
 ] 

Hudson commented on HBASE-4713:
---

Integrated in HBase-0.92 #100 (See 
[https://builds.apache.org/job/HBase-0.92/100/])
HBASE-4713 Raise debug level to warn on ExecutionException in 
HConnectionManager

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java


 Raise debug level to warn on ExecutionException in 
 HConnectionManager$HConnectionImplementation
 ---

 Key: HBASE-4713
 URL: https://issues.apache.org/jira/browse/HBASE-4713
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Lucian George Iordache
 Fix For: 0.92.0

 Attachments: 4713.patch, HBASE-4713-patch.txt


 The ExecutionException is logged on debug level, and it should be logged on 
 warn. I've met the problem in the next case:
 - hbase.rpc.timeout = 6
 - lease time on region server = 24
 - started a scan that takes more than 60 seconds on the region server == 
 SocketTimeoutException logged on debug
 Having the log level on info, the exception was not observable on the client 
 side and it took me a while to figure out what was hapenning.
 See also:
 - https://issues.apache.org/jira/browse/HBASE-3154
 - 
 http://mail-archives.apache.org/mod_mbox/hbase-user/201110.mbox/%3CCANH3+J0athaCjK-ahu-A=hrzoosjyh6s_mtpzm3_qqpfrcs...@mail.gmail.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4716) Improve locking for single column family bulk load

2011-11-02 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142326#comment-13142326
 ] 

Todd Lipcon commented on HBASE-4716:


this was already committed, but I just got to my email for the day:
- why does getColumnFamilyType return an enum when all we care about is 
hasMultipleColumnFamilies() (a boolean?) This makes it harder to understand. 
(this function doesn't return a type of column family.)
- even if you use an enum, our style is not ALL_CAPS for enum class names. Only 
ALL_CAPS for the values.
- why is the new function public? it should be private.



 Improve locking for single column family bulk load
 --

 Key: HBASE-4716
 URL: https://issues.apache.org/jira/browse/HBASE-4716
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4716-v2.txt, 4716.txt


 HBASE-4552 changed the locking behavior for single column family bulk load, 
 namely we don't need to take write lock.
 A read lock would suffice in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4729) Race between online altering and splitting kills the master

2011-11-02 Thread Jean-Daniel Cryans (Created) (JIRA)

Race between online altering and splitting kills the master
---

 Key: HBASE-4729
 URL: https://issues.apache.org/jira/browse/HBASE-4729
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.0, 0.94.0


I was running an online alter while regions were splitting, and suddenly the 
master died and left my table half-altered (haven't restarted the master yet).

What killed the master:

{quote}
2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: 
Unexpected ZK exception creating node CLOSING
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101
at org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459)
at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441)
at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769)
at 
org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568)
at 
org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722)
at 
org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661)
at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{quote}

A znode was created because the region server was splitting the region 4 
seconds before:

{quote}
2011-11-02 17:06:40,704 INFO 
org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region 
TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101.
2011-11-02 17:06:40,704 DEBUG 
org.apache.hadoop.hbase.regionserver.SplitTransaction: 
regionserver:62023-0x132f043bbde0710 Creating ephemeral node for 
f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state
2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:62023-0x132f043bbde0710 Attempting to transition node 
f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
RS_ZK_REGION_SPLITTING
...
2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:62023-0x132f043bbde0710 Successfully transitioned node 
f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to 
RS_ZK_REGION_SPLIT
2011-11-02 17:06:44,061 INFO 
org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
master to process the split for f7e1783e65ea8d621a4bc96ad310f101
{quote}

Now that the master is dead the region server is spewing those last two lines 
like mad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4716) Improve locking for single column family bulk load


[ 
https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142339#comment-13142339
 ] 

Ted Yu commented on HBASE-4716:
---

When creating a method for hasMultipleColumnFamilies(), I found that I should 
deal with the possibility of familyPaths being null. So I created an enum that 
can represent tri-state.
I thought getColumnFamilyType() might be useful in other occasions.
For now, I can change it to private.

I will change spelling for the enum type as well.

 Improve locking for single column family bulk load
 --

 Key: HBASE-4716
 URL: https://issues.apache.org/jira/browse/HBASE-4716
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4716-v2.txt, 4716.txt


 HBASE-4552 changed the locking behavior for single column family bulk load, 
 namely we don't need to take write lock.
 A read lock would suffice in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3898) TestSplitTransactionOnCluster broke in TRUNK


 [ 
https://issues.apache.org/jira/browse/HBASE-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-3898:
--

Fix Version/s: 0.92.0

 TestSplitTransactionOnCluster broke in TRUNK
 

 Key: HBASE-3898
 URL: https://issues.apache.org/jira/browse/HBASE-3898
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: 3898.txt


 It hangs for 15 minutes.  I see a NPE trying to split a region.  The splitKey 
 passed is null.  Looks to be by-product of recent compaction refactorings.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4719) HBase script assumes pre-Hadoop 0.21 layout of jar files


 [ 
https://issues.apache.org/jira/browse/HBASE-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4719:
-

Fix Version/s: 0.92.0
 Assignee: Roman Shaposhnik

 HBase script assumes pre-Hadoop 0.21 layout of jar files
 

 Key: HBASE-4719
 URL: https://issues.apache.org/jira/browse/HBASE-4719
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.92.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Fix For: 0.92.0

 Attachments: HBASE-4719.patch.txt


 The following in the bin/hbase:
 {noformat}
 HADOOPCPPATH=$(append_path ${HADOOPCPPATH} `ls 
 ${HADOOP_HOME}/hadoop-core*.jar`)
 {noformat}
 assumes a pre-21 Hadoop layout. It'll be better to dynamically account for 
 either hadoop-core* or hadoop-common*, hadoop-hdfs*, hadoop-mapreduce*

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4716) Improve locking for single column family bulk load


[ 
https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142361#comment-13142361
 ] 

Ted Yu commented on HBASE-4716:
---

When familyPaths is null, hasMultipleColumnFamilies() returns false.
Just want to confirm.

Personally I think patch v1 looks cleaner.

 Improve locking for single column family bulk load
 --

 Key: HBASE-4716
 URL: https://issues.apache.org/jira/browse/HBASE-4716
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4716-v2.txt, 4716.addendum, 4716.txt


 HBASE-4552 changed the locking behavior for single column family bulk load, 
 namely we don't need to take write lock.
 A read lock would suffice in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4716) Improve locking for single column family bulk load


 [ 
https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4716:
--

Attachment: (was: 4716.addendum)

 Improve locking for single column family bulk load
 --

 Key: HBASE-4716
 URL: https://issues.apache.org/jira/browse/HBASE-4716
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4716-v2.txt, 4716.txt


 HBASE-4552 changed the locking behavior for single column family bulk load, 
 namely we don't need to take write lock.
 A read lock would suffice in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4716) Improve locking for single column family bulk load


 [ 
https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4716:
--

Attachment: 4716.addendum

Removed enum in addendum.
TestLoadIncrementalHFilesSplitRecovery passes.

 Improve locking for single column family bulk load
 --

 Key: HBASE-4716
 URL: https://issues.apache.org/jira/browse/HBASE-4716
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4716-v2.txt, 4716.addendum, 4716.txt


 HBASE-4552 changed the locking behavior for single column family bulk load, 
 namely we don't need to take write lock.
 A read lock would suffice in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.


 [ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-1744:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Thrift server to match the new java api.
 

 Key: HBASE-1744
 URL: https://issues.apache.org/jira/browse/HBASE-1744
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Tim Sell
Assignee: Tim Sell
Priority: Critical
 Fix For: 0.94.0

 Attachments: 
 0001-thrift2-enable-usage-of-.deleteColumns-for-thrift.patch, 1744-trunk.10, 
 1744.addendum, HBASE-1744.11.patch, HBASE-1744.2.patch, HBASE-1744.3.patch, 
 HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, 
 HBASE-1744.7.patch, HBASE-1744.8.patch, HBASE-1744.9.patch, 
 HBASE-1744.preview.1.patch, thriftexperiment.patch


 This mutateRows, etc.. is a little confusing compared to the new cleaner java 
 client.
 Thinking of ways to make a thrift client that is just as elegant. something 
 like:
 void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
 with:
 struct TColumn {
   1:Bytes family,
   2:Bytes qualifier,
   3:i64 timestamp
 }
 struct TPut {
   1:Bytes row,
   2:mapTColumn, Bytes values
 }
 This creates more verbose rpc  than if the columns in TPut were just 
 mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and 
 still be intuitive from say python.
 Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4725) NPE in AM#updateTimers

[
https://issues.apache.org/jira/browse/HBASE-4725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142393#comment-13142393
]

Hadoop QA commented on HBASE-4725:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12501982/am.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 javadoc. The javadoc tool appears to have generated -165 warning
messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 42 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.master.TestDistributedLogSplitting

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/140//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/140//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/140//console

This message is automatically generated.

NPE in AM#updateTimers
--

Key: HBASE-4725
URL: https://issues.apache.org/jira/browse/HBASE-4725
Project: HBase
Issue Type: Bug
Reporter: stack
Assignee: stack
Fix For: 0.92.0

Attachments: am.txt

[jira] [Updated] (HBASE-4716) Improve locking for single column family bulk load


 [ 
https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4716:
--

Attachment: (was: 4716.addendum)

 Improve locking for single column family bulk load
 --

 Key: HBASE-4716
 URL: https://issues.apache.org/jira/browse/HBASE-4716
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4716-v2.txt, 4716.txt


 HBASE-4552 changed the locking behavior for single column family bulk load, 
 namely we don't need to take write lock.
 A read lock would suffice in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4583) Integrate RWCC with Append and Increment operations

2011-11-02 Thread Lars Hofhansl (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142398#comment-13142398
 ] 

Lars Hofhansl commented on HBASE-4583:
--

You make a good point. If people want performance they'd pass false as 
wreToWal. Otherwise they will get correct and slow behavior. 

 Integrate RWCC with Append and Increment operations
 ---

 Key: HBASE-4583
 URL: https://issues.apache.org/jira/browse/HBASE-4583
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.0

 Attachments: 4583-v2.txt, 4583-v3.txt, 4583-v4.txt, 4583.txt


 Currently Increment and Append operations do not work with RWCC and hence a 
 client could see the results of multiple such operation mixed in the same 
 Get/Scan.
 The semantics might be a bit more interesting here as upsert adds and removes 
 to and from the memstore.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4716) Improve locking for single column family bulk load

2011-11-02 Thread Todd Lipcon (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142399#comment-13142399
 ] 

Todd Lipcon commented on HBASE-4716:


+1 on addendum, thanks Ted

 Improve locking for single column family bulk load
 --

 Key: HBASE-4716
 URL: https://issues.apache.org/jira/browse/HBASE-4716
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4716-v2.txt, 4716.addendum, 4716.txt


 HBASE-4552 changed the locking behavior for single column family bulk load, 
 namely we don't need to take write lock.
 A read lock would suffice in this scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4518) TestServerCustomProtocol is flaky

2011-11-02 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142400#comment-13142400
]

Hadoop QA commented on HBASE-4518:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12501985/HBASE-4518.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

-1 javadoc. The javadoc tool appears to have generated -165 warning
messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 42 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.master.TestDistributedLogSplitting
org.apache.hadoop.hbase.TestGlobalMemStoreSize
org.apache.hadoop.hbase.master.TestMasterFailover

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/141//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/141//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/141//console

This message is automatically generated.

TestServerCustomProtocol is flaky
-

Key: HBASE-4518
URL: https://issues.apache.org/jira/browse/HBASE-4518
Project: HBase
Issue Type: Bug
Components: coprocessors, test
Affects Versions: 0.92.0
Reporter: Gary Helmling
Assignee: Gary Helmling
Fix For: 0.92.0

Attachments: HBASE-4518.patch,
org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol-output.txt,
org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol.txt

TestServerCustomProtocol has been showing some intermittent failures in
Jenkins due to what looks like region transitions.
Here is the most recent failure:
{noformat}
Results :
Failed tests:
testRowRange(org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol):
Results should contain region
test,bbb,1317332645939.aea9154349b9e0dc207e2e9476702763. for row 'bbb'
{noformat}

[jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows


[ 
https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142405#comment-13142405
 ] 

jirapos...@reviews.apache.org commented on HBASE-4536:
--



bq.  On 2011-11-02 06:44:54, Prakash Khemani wrote:
bq.   
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java,
 lines 210-212
bq.   https://reviews.apache.org/r/2178/diff/12/?file=50954#file50954line210
bq.  
bq.   Hi Lars, Isn't this early-out problematic? It doesn't take into 
account min-versions. It doesn't take into account the newly introduced 
keepDeletedCells mode.

The early out only happens when miversions is not set.  Check out *ColumnTracker


- Lars


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2178/#review3009
---


On 2011-10-18 21:43:38, Lars Hofhansl wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2178/
bq.  ---
bq.  
bq.  (Updated 2011-10-18 21:43:38)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  HBase timerange Gets and Scans allow to do timetravel in HBase. I.e. 
look at the state of the data at any point in the past, provided the data is 
still around.
bq.  This did not work for deletes, however. Deletes would always mask all puts 
in the past.
bq.  This change adds a flag that can be on HColumnDescriptor to enable 
retention of deleted rows.
bq.  These rows are still subject to TTL and/or VERSIONS.
bq.  
bq.  This changes the following:
bq.  1. There is a new flag on HColumnDescriptor enabling that behavior.
bq.  2. Allow gets/scans with a timerange to retrieve rows hidden by a delete 
marker, if the timerange does not include the delete marker.
bq.  3. Do not unconditionally collect all deleted rows during a compaction.
bq.  4. Allow a raw Scan, which retrieves all delete markers and deleted rows.
bq.  
bq.  The change is small'ish, but the logic is intricate, so please review 
carefully.
bq.  
bq.  
bq.  This addresses bug HBASE-4536.
bq.  https://issues.apache.org/jira/browse/HBASE-4536
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Attributes.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
 1185362 
bq.http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 
1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompaction.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeepDeletes.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
 1185362 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
 1185362

[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.


 [ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4213:
--

Status: Patch Available  (was: Open)

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.92.0

 Attachments: 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Instant_Schema_change_through_ZK.patch, 
 4213-Nov-2-2011_patch_.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-v9.txt, 
 4213.v6, HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4213) Support for fault tolerant, instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) through ZK.


 [ 
https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4213:
--

Attachment: 4213-Nov-2-2011_patch_.patch

Patch from Subbu for TRUNK.

 Support for fault tolerant, instant schema updates with out master's 
 intervention (i.e with out enable/disable and bulk assign/unassign) through 
 ZK.
 

 Key: HBASE-4213
 URL: https://issues.apache.org/jira/browse/HBASE-4213
 Project: HBase
  Issue Type: Improvement
Reporter: Subbu M Iyer
Assignee: Subbu M Iyer
 Fix For: 0.92.0

 Attachments: 
 4213-101211-Support_instant_schema_changes_through_ZK.patch, 
 4213-102511.patch, 4213-Instant_Schema_change_through_ZK.patch, 
 4213-Nov-2-2011_patch_.patch, 
 4213-V10-Support_instant_schema_changes_through_ZK.patch, 
 4213-V5-Support_instant_schema_changes_through_ZK.patch, 
 4213-V7-Support_instant_schema_changes_through_ZK.patch, 
 4213-V8-Support_instant_schema_changes_through_ZK.patch, 
 4213-V9-Support_instant_schema_changes_through_ZK.patch, 4213-v9.txt, 
 4213.v6, HBASE-4213-Instant_schema_change.patch, 
 HBASE-4213_Instant_schema_change_-Version_2_.patch, 
 HBASE_Instant_schema_change-version_3_.patch


 This Jira is a slight variation in approach to what is being done as part of 
 https://issues.apache.org/jira/browse/HBASE-1730
 Support instant schema updates such as Modify Table, Add Column, Modify 
 Column operations:
 1. With out enable/disabling the table.
 2. With out bulk unassign/assign of regions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4713) Raise debug level to warn on ExecutionException in HConnectionManager$HConnectionImplementation