[jira] [Created] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Mikhail Bautin (Created) (JIRA)
Make HBase Thrift server more configurable and add a command-line UI test
-

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


This started as an internal hotfix where we found out that the Thrift server 
spawned 15000 threads. To bound the thread pool size I added a custom thread 
pool server implementation called HBaseThreadPoolServer into HBase codebase, 
and made the following parameters configurable from both command line and as 
config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
Under an increasing load, the server creates new threads for every connection 
before the pool size reaches minWorkerThreads. After that, the server puts new 
connections into the queue and only creates a new thread when the queue is 
full. If an attempt to create a new thread fails, the server drops connection. 
The default TThreadPoolServer would crash in that case, but it never happened 
because the thread pool was unbounded, so the server would hang indefinitely, 
consume a lot of memory, and cause huge latency spikes on the client side.

Another part of this fix is refactoring and unit testing of the command-line 
part of the Thrift server. The logic there is sufficiently complicated, and the 
existing ThriftServer class does not test that part at all. The new 
TestThriftServerCmdLine test starts the Thrift server on a random port with 
various combinations of options and talks to it through the client API from 
another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-4863:
---

Attachment: D531.1.patch

mbautin requested code review of [jira] [HBASE-4863] Make HBase Thrift server 
more configurable and add a command-line UI test.
Reviewers: JIRA, Kannan, tedyu, stack

  This started as an internal hotfix where we found out that the Thrift server 
spawned 15000 threads. To bound the thread pool size I added a custom thread 
pool server implementation called HBaseThreadPoolServer into HBase codebase, 
and made the following parameters configurable from both command line and as 
config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
Under an increasing load, the server creates new threads for every connection 
before the pool size reaches minWorkerThreads. After that, the server puts new 
connections into the queue and only creates a new thread when the queue is 
full. If an attempt to create a new thread fails, the server drops connection. 
The default TThreadPoolServer would crash in that case, but it never happened 
because the thread pool was unbounded, so the server would hang indefinitely, 
consume a lot of memory, and cause huge latency spikes on the client side.

  Another part of this fix is refactoring and unit testing of the command-line 
part of the Thrift server. The logic there is sufficiently complicated, and the 
existing ThriftServer class does not test that part at all. The new 
TestThriftServerCmdLine test starts the Thrift server on a random port with 
various combinations of options and talks to it through the client API from 
another thread.


TEST PLAN
  Unit tests, cluster test with a Python Thrift client.
  I will post an update when I'm done with testing.

REVISION DETAIL
  https://reviews.facebook.net/D531

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
  src/main/java/org/apache/hadoop/hbase/util/Threads.java
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
  src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServer.java
  src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java
  src/test/java/org/apache/hadoop/hbase/util/TestThreads.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/1167/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.


 Make HBase Thrift server more configurable and add a command-line UI test
 -

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D531.1.patch


 This started as an internal hotfix where we found out that the Thrift server 
 spawned 15000 threads. To bound the thread pool size I added a custom thread 
 pool server implementation called HBaseThreadPoolServer into HBase codebase, 
 and made the following parameters configurable from both command line and as 
 config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
 Under an increasing load, the server creates new threads for every connection 
 before the pool size reaches minWorkerThreads. After that, the server puts 
 new connections into the queue and only creates a new thread when the queue 
 is full. If an attempt to create a new thread fails, the server drops 
 connection. The default TThreadPoolServer would crash in that case, but it 
 never happened because the thread pool was unbounded, so the server would 
 hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
 the client side.
 Another part of this fix is refactoring and unit testing of the command-line 
 part of the Thrift server. The logic there is sufficiently complicated, and 
 the existing ThriftServer class does not test that part at all. The new 
 TestThriftServerCmdLine test starts the Thrift server on a random port with 
 various combinations of options and talks to it through the client API from 
 another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-4863:
---

Attachment: D531.2.patch

mbautin updated the revision [jira] [HBASE-4863] Make HBase Thrift server more 
configurable and add a command-line UI test.
Reviewers: JIRA, Kannan, tedyu, stack

  Updating with the most recent version. Posted a stale version at first -- 
sorry for spam.

REVISION DETAIL
  https://reviews.facebook.net/D531

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
  src/main/java/org/apache/hadoop/hbase/util/Threads.java
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
  src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServer.java
  src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java
  src/test/java/org/apache/hadoop/hbase/util/TestThreads.java


 Make HBase Thrift server more configurable and add a command-line UI test
 -

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D531.1.patch, D531.2.patch


 This started as an internal hotfix where we found out that the Thrift server 
 spawned 15000 threads. To bound the thread pool size I added a custom thread 
 pool server implementation called HBaseThreadPoolServer into HBase codebase, 
 and made the following parameters configurable from both command line and as 
 config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
 Under an increasing load, the server creates new threads for every connection 
 before the pool size reaches minWorkerThreads. After that, the server puts 
 new connections into the queue and only creates a new thread when the queue 
 is full. If an attempt to create a new thread fails, the server drops 
 connection. The default TThreadPoolServer would crash in that case, but it 
 never happened because the thread pool was unbounded, so the server would 
 hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
 the client side.
 Another part of this fix is refactoring and unit testing of the command-line 
 part of the Thrift server. The logic there is sufficiently complicated, and 
 the existing ThriftServer class does not test that part at all. The new 
 TestThriftServerCmdLine test starts the Thrift server on a random port with 
 various combinations of options and talks to it through the client API from 
 another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4785) Improve recovery time of HBase client when a region server dies.

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156629#comment-13156629
 ] 

Hudson commented on HBASE-4785:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4785 Improve recovery time of HBase client when a region server dies.

nspiegelberg : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java


 Improve recovery time of HBase client when a region server dies.
 

 Key: HBASE-4785
 URL: https://issues.apache.org/jira/browse/HBASE-4785
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-4785.patch, HBASE-4785.patch


 When a region server dies, the HBase client waits until the RPC timesout 
 before learning that it needs to check META to find the new location of the 
 region. And it incurs this *timeout* cost for every region being served by 
 the dead region server. Remove this overhead by clearing the entries in cache 
 that have the dead region server as their values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156630#comment-13156630
 ] 

Hudson commented on HBASE-4308:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4308 Race between RegionOpenedHandler and AssignmentManager(Ram)

ramkrishna : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java


 Race between RegionOpenedHandler and AssignmentManager
 --

 Key: HBASE-4308
 URL: https://issues.apache.org/jira/browse/HBASE-4308
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4308.patch, HBASE-4308_1.patch, HBASE-4308_2.patch


 When the master is processing a ZK event for REGION_OPENED, it calls delete() 
 on the znode before it removes the node from RegionsInTransition. If the 
 notification of that delete comes back into AssignmentManager before the 
 region is removed from RIT, you see an error like:
 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
 master.AssignmentManager(861): Node deleted but still in RIT: 
 .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
 server=todd-w510,55655,1314751396840
 Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156631#comment-13156631
 ] 

Hudson commented on HBASE-4853:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4853 HBASE-4789 does overzealous pruning of seqids
HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT TEMPORARILY TO 
GET TED COMMENT IN
HBASE-4853 HBASE-4789 does overzealous pruning of seqids

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java


 HBASE-4789 does overzealous pruning of seqids
 -

 Key: HBASE-4853
 URL: https://issues.apache.org/jira/browse/HBASE-4853
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v10.txt, 
 4853-v4.txt, 4853-v5.txt, 4853-v6.txt, 4853-v7.txt, 4853-v8.txt, 4853-v9.txt, 
 4853-v9.txt, 4853.txt


 Working w/ J-D on failing replication test turned up hole in seqids made by 
 the patch over in hbase-4789.  With this patch in place we see lots of 
 instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
 old hlogs'
 At a minimum, these lines need removing:
 {code}
 diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
 b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 index 623edbe..a0bbe01 100644
 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
// Cleaning up of lastSeqWritten is in the finally clause because we
// don't want to confuse getOldestOutstandingSeqNum()
this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
 -  Long l = this.lastSeqWritten.remove(encodedRegionName);
 -  if (l != null) {
 -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? 
 name= +
 -  Bytes.toString(encodedRegionName) + , seqid= + l);
 -   }
this.cacheFlushLock.unlock();
  }
}
 {code}
 ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4787) Make corePool as a configurable parameter in HTable

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156627#comment-13156627
 ] 

Hudson commented on HBASE-4787:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4787 Rename HTable thread pool

nspiegelberg : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java


 Make corePool as a configurable parameter in HTable
 ---

 Key: HBASE-4787
 URL: https://issues.apache.org/jira/browse/HBASE-4787
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Priority: Trivial
 Fix For: 0.94.0

 Attachments: HBASE-4787.patch


 Make the corePool a configurable parameter in HTable. So we can tune this 
 parameter in the config file.  While at it, change the core pool name so we 
 can distinguish it from other AppServer pools.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156628#comment-13156628
 ] 

Hudson commented on HBASE-4739:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4739  Master dying while going to close a region can leave it in 
transition
   forever (Gao Jinchao)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/RegionTransitionData.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java


 Master dying while going to close a region can leave it in transition forever
 -

 Key: HBASE-4739
 URL: https://issues.apache.org/jira/browse/HBASE-4739
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
 HBASE-4739_Branch092.patch, HBASE-4739_Trunk.patch, 
 HBASE-4739_Trunk_V2.patch, HBASE-4739_V7.patch, HBASE-4739_trail5.patch, 
 HBASE-4739_trial.patch, HBASE-4739_trial6.patch


 I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
 the master died it had just created the RIT znode for a region but didn't 
 tell the RS to close it yet.
 When the master restarted it saw the znode and started printing this:
 {quote}
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
 state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
 2011-11-03 00:02:49,130 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
 too long, this should eventually complete or the server will expire, doing 
 nothing
 {quote}
 It's never going to happen, and it's blocking balancing.
 I'm marking this as minor since I believe this situation is pretty rare 
 unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4783) Improve RowCounter to count rows in a specific key range.

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156632#comment-13156632
 ] 

Hudson commented on HBASE-4783:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4783 Improve RowCounter to count rows in a specific key range.

nspiegelberg : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java


 Improve RowCounter to count rows in a specific key range.
 -

 Key: HBASE-4783
 URL: https://issues.apache.org/jira/browse/HBASE-4783
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Trivial
 Fix For: 0.94.0

 Attachments: 4783.txt, HBASE-4783.patch


 Currently RowCounter in MR package is a very simple map only job that does a 
 full scan of a table. Enhance the utility to let the user specify a key range 
 and count the number of rows in this range. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4789) On split, parent region is sticking around in oldest sequenceid to region map though not online; we don't cleanup WALs.

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156635#comment-13156635
 ] 

Hudson commented on HBASE-4789:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4853 HBASE-4789 does overzealous pruning of seqids
HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT TEMPORARILY TO 
GET TED COMMENT IN
HBASE-4853 HBASE-4789 does overzealous pruning of seqids

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java


 On split, parent region is sticking around in oldest sequenceid to region map 
 though not online; we don't cleanup WALs.
 ---

 Key: HBASE-4789
 URL: https://issues.apache.org/jira/browse/HBASE-4789
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4789-v2.txt, 4789-v3.txt, 4789-v4.txt, 4789.txt


 Here is log for a particular region:
 {code}
 2011-11-15 05:46:31,382 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
 master to process the split for 8bbd7388262dc8cb1ce2cf4f04a7281d
 2011-11-15 05:46:31,483 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:7003-0x1337b0b92cd000a-0x1337b0b92cd000a Attempting to 
 transition node 8bbd7388262dc8cb1ce2cf4f04a7281d from RS_ZK_REGION_SPLIT to 
 RS_ZK_REG
 ION_SPLIT
 2011-11-15 05:46:31,484 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, META 
 updated, and report to master. 
 Parent=TestTable,0862220095,1321335865649.8bbd7388262dc8cb1ce2cf4f04a7281d., 
 new regions: TestTab
 le,0862220095,1321335989689.f00c683df3182d8ef33e315f77ca539c., 
 TestTable,0892568091,1321335989689.a56ca1eff5b4401432fcba04b4e851f8.. Split 
 took 1sec
 2011-11-15 05:46:37,705 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Compacting 
 hdfs://sv4r11s38:7000/hbase/TestTable/a56ca1eff5b4401432fcba04b4e851f8/info/9ce16d8fa94e4938964c04775a6fa1a7.8bbd7388262dc8cb1ce2cf4f04a7281d-
 hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9ce16d8fa94e4938964c04775a6fa1a7-top,
  keycount=717559, bloomtype=NONE, size=711.1m
 2011-11-15 05:46:37,705 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Compacting 
 hdfs://sv4r11s38:7000/hbase/TestTable/a56ca1eff5b4401432fcba04b4e851f8/info/9213f4d7ee9b4fda857a97603a001f9e.8bbd7388262dc8cb1ce2cf4f04a7281d-
 hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9213f4d7ee9b4fda857a97603a001f9e-top,
  keycount=416691, bloomtype=NONE, size=412.9m
 2011-11-15 05:46:53,090 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Compacting 
 hdfs://sv4r11s38:7000/hbase/TestTable/f00c683df3182d8ef33e315f77ca539c/info/9ce16d8fa94e4938964c04775a6fa1a7.8bbd7388262dc8cb1ce2cf4f04a7281d-
 hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9ce16d8fa94e4938964c04775a6fa1a7-bottom,
  keycount=717559, bloomtype=NONE, size=711.1m
 2011-11-15 05:46:53,090 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Compacting 
 hdfs://sv4r11s38:7000/hbase/TestTable/f00c683df3182d8ef33e315f77ca539c/info/9213f4d7ee9b4fda857a97603a001f9e.8bbd7388262dc8cb1ce2cf4f04a7281d-
 hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9213f4d7ee9b4fda857a97603a001f9e-bottom,
  keycount=416691, bloomtype=NONE, size=412.9m
 2011-11-15 05:48:00,690 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: 
 Found 3 hlogs to remove out of total 12; oldest outstanding sequenceid is 
 5699 from region 8bbd7388262dc8cb1ce2cf4f04a7281d
 2011-11-15 05:57:54,083 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
 Too many hlogs: logs=33, maxlogs=32; forcing flush of 1 regions(s): 
 8bbd7388262dc8cb1ce2cf4f04a7281d
 2011-11-15 05:57:54,083 WARN org.apache.hadoop.hbase.regionserver.LogRoller: 
 Failed to schedule flush of 8bbd7388262dc8cb1ce2cf4f04a7281dr=null, 
 requester=null
 2011-11-15 05:58:01,358 INFO 

[jira] [Commented] (HBASE-4772) Utility to Create StoreFiles

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156633#comment-13156633
 ] 

Hudson commented on HBASE-4772:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4772 Utility to Create StoreFiles

karthik : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java


 Utility to Create StoreFiles
 

 Key: HBASE-4772
 URL: https://issues.apache.org/jira/browse/HBASE-4772
 Project: HBase
  Issue Type: Test
Affects Versions: 0.94.0
Reporter: Nicolas Spiegelberg
Assignee: Mikhail Bautin
Priority: Minor
 Fix For: 0.94.0

 Attachments: HBASE-4772-B.patch, HBASE-4772.patch


 Add a tool to create a StoreFile with the specified number of key/value 
 pairs, with the specified compression and Bloom filter type.  This is useful 
 for creating HFileV1  HFileV2 store files for testing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156636#comment-13156636
 ] 

Hudson commented on HBASE-4857:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4857  Recursive loop on KeeperException in 
AuthenticationTokenSecretManager

garyh : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/security/src/main/java/org/apache/hadoop/hbase/security/token/AuthenticationTokenSecretManager.java


 Recursive loop on KeeperException in 
 AuthenticationTokenSecretManager/ZKLeaderManager
 -

 Key: HBASE-4857
 URL: https://issues.apache.org/jira/browse/HBASE-4857
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.92.0, 0.94.0
Reporter: Gary Helmling
Assignee: Gary Helmling
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-4857.patch


 Looking through stack traces for {{TestMasterFailover}}, I see a case where 
 the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop 
 when a {{KeeperException}} is encountered:
 {noformat}
 Thread-1-EventThread daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 
 waiting on condition [0x7f9fab376000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
 at java.lang.Thread.sleep(Native Method)
 at java.lang.Thread.sleep(Thread.java:302)
 at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328)
 at 
 org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.init(ZooKeeperWatcher.java:154)
 at 
 org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397)
 at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435)
 at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
 at 
 org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78)
 at 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
 {noformat}
 The {{KeeperException}} causes {{ZKLeaderManager}} to call 
 {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls 
 {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another 
 {{KeeperException}}, and so on...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4856) Upgrade zookeeper to 3.4.0 release

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156634#comment-13156634
 ] 

Hudson commented on HBASE-4856:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4856  Upgrade zookeeper to 3.4.0 release - revert, Apache maven 
repository not ready
HBASE-4856  Upgrade zookeeper to 3.4.0 release

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/pom.xml

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/pom.xml


 Upgrade zookeeper to 3.4.0 release
 --

 Key: HBASE-4856
 URL: https://issues.apache.org/jira/browse/HBASE-4856
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.92.0

 Attachments: 4856.txt


 Zookeeper 3.4.0 has been released.
 We should upgade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4864) testRegionTransitionOperations occasional failures

2011-11-24 Thread gaojinchao (Created) (JIRA)
testRegionTransitionOperations occasional failures
--

 Key: HBASE-4864
 URL: https://issues.apache.org/jira/browse/HBASE-4864
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0


looks this logs:
https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/

It seems that we should wait region is added to online region set.

I made a patch, Please review.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4864) testRegionTransitionOperations occasional failures

2011-11-24 Thread gaojinchao (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4864:
--

Attachment: HBASE-4864_Branch92.patch

 testRegionTransitionOperations occasional failures
 --

 Key: HBASE-4864
 URL: https://issues.apache.org/jira/browse/HBASE-4864
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4864_Branch92.patch


 looks this logs:
 https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
 It seems that we should wait region is added to online region set.
 I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4864) testRegionTransitionOperations occasional failures

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156646#comment-13156646
 ] 

Ted Yu commented on HBASE-4864:
---

+1 on patch. 

 testRegionTransitionOperations occasional failures
 --

 Key: HBASE-4864
 URL: https://issues.apache.org/jira/browse/HBASE-4864
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4864_Branch92.patch


 looks this logs:
 https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
 It seems that we should wait region is added to online region set.
 I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4864) testRegionTransitionOperations occasional failures

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4864:
--

Status: Patch Available  (was: Open)

 testRegionTransitionOperations occasional failures
 --

 Key: HBASE-4864
 URL: https://issues.apache.org/jira/browse/HBASE-4864
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4864_Branch92.patch


 looks this logs:
 https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
 It seems that we should wait region is added to online region set.
 I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156653#comment-13156653
 ] 

Hudson commented on HBASE-4853:
---

Integrated in HBase-TRUNK #2477 (See 
[https://builds.apache.org/job/HBase-TRUNK/2477/])
HBASE-4853 HBASE-4789 does overzealous pruning of seqids
HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT TEMPORARILY TO 
GET TED COMMENT IN
HBASE-4853 HBASE-4789 does overzealous pruning of seqids

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java


 HBASE-4789 does overzealous pruning of seqids
 -

 Key: HBASE-4853
 URL: https://issues.apache.org/jira/browse/HBASE-4853
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v10.txt, 
 4853-v4.txt, 4853-v5.txt, 4853-v6.txt, 4853-v7.txt, 4853-v8.txt, 4853-v9.txt, 
 4853-v9.txt, 4853.txt


 Working w/ J-D on failing replication test turned up hole in seqids made by 
 the patch over in hbase-4789.  With this patch in place we see lots of 
 instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
 old hlogs'
 At a minimum, these lines need removing:
 {code}
 diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
 b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 index 623edbe..a0bbe01 100644
 --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
 @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
// Cleaning up of lastSeqWritten is in the finally clause because we
// don't want to confuse getOldestOutstandingSeqNum()
this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
 -  Long l = this.lastSeqWritten.remove(encodedRegionName);
 -  if (l != null) {
 -LOG.warn(Why is there a raw encodedRegionName in lastSeqWritten? 
 name= +
 -  Bytes.toString(encodedRegionName) + , seqid= + l);
 -   }
this.cacheFlushLock.unlock();
  }
}
 {code}
 ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4789) On split, parent region is sticking around in oldest sequenceid to region map though not online; we don't cleanup WALs.

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156654#comment-13156654
 ] 

Hudson commented on HBASE-4789:
---

Integrated in HBase-TRUNK #2477 (See 
[https://builds.apache.org/job/HBase-TRUNK/2477/])
HBASE-4853 HBASE-4789 does overzealous pruning of seqids
HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT TEMPORARILY TO 
GET TED COMMENT IN
HBASE-4853 HBASE-4789 does overzealous pruning of seqids

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java


 On split, parent region is sticking around in oldest sequenceid to region map 
 though not online; we don't cleanup WALs.
 ---

 Key: HBASE-4789
 URL: https://issues.apache.org/jira/browse/HBASE-4789
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: 4789-v2.txt, 4789-v3.txt, 4789-v4.txt, 4789.txt


 Here is log for a particular region:
 {code}
 2011-11-15 05:46:31,382 INFO 
 org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
 master to process the split for 8bbd7388262dc8cb1ce2cf4f04a7281d
 2011-11-15 05:46:31,483 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 regionserver:7003-0x1337b0b92cd000a-0x1337b0b92cd000a Attempting to 
 transition node 8bbd7388262dc8cb1ce2cf4f04a7281d from RS_ZK_REGION_SPLIT to 
 RS_ZK_REG
 ION_SPLIT
 2011-11-15 05:46:31,484 INFO 
 org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, META 
 updated, and report to master. 
 Parent=TestTable,0862220095,1321335865649.8bbd7388262dc8cb1ce2cf4f04a7281d., 
 new regions: TestTab
 le,0862220095,1321335989689.f00c683df3182d8ef33e315f77ca539c., 
 TestTable,0892568091,1321335989689.a56ca1eff5b4401432fcba04b4e851f8.. Split 
 took 1sec
 2011-11-15 05:46:37,705 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Compacting 
 hdfs://sv4r11s38:7000/hbase/TestTable/a56ca1eff5b4401432fcba04b4e851f8/info/9ce16d8fa94e4938964c04775a6fa1a7.8bbd7388262dc8cb1ce2cf4f04a7281d-
 hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9ce16d8fa94e4938964c04775a6fa1a7-top,
  keycount=717559, bloomtype=NONE, size=711.1m
 2011-11-15 05:46:37,705 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Compacting 
 hdfs://sv4r11s38:7000/hbase/TestTable/a56ca1eff5b4401432fcba04b4e851f8/info/9213f4d7ee9b4fda857a97603a001f9e.8bbd7388262dc8cb1ce2cf4f04a7281d-
 hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9213f4d7ee9b4fda857a97603a001f9e-top,
  keycount=416691, bloomtype=NONE, size=412.9m
 2011-11-15 05:46:53,090 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Compacting 
 hdfs://sv4r11s38:7000/hbase/TestTable/f00c683df3182d8ef33e315f77ca539c/info/9ce16d8fa94e4938964c04775a6fa1a7.8bbd7388262dc8cb1ce2cf4f04a7281d-
 hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9ce16d8fa94e4938964c04775a6fa1a7-bottom,
  keycount=717559, bloomtype=NONE, size=711.1m
 2011-11-15 05:46:53,090 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Compacting 
 hdfs://sv4r11s38:7000/hbase/TestTable/f00c683df3182d8ef33e315f77ca539c/info/9213f4d7ee9b4fda857a97603a001f9e.8bbd7388262dc8cb1ce2cf4f04a7281d-
 hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9213f4d7ee9b4fda857a97603a001f9e-bottom,
  keycount=416691, bloomtype=NONE, size=412.9m
 2011-11-15 05:48:00,690 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: 
 Found 3 hlogs to remove out of total 12; oldest outstanding sequenceid is 
 5699 from region 8bbd7388262dc8cb1ce2cf4f04a7281d
 2011-11-15 05:57:54,083 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
 Too many hlogs: logs=33, maxlogs=32; forcing flush of 1 regions(s): 
 8bbd7388262dc8cb1ce2cf4f04a7281d
 2011-11-15 05:57:54,083 WARN org.apache.hadoop.hbase.regionserver.LogRoller: 
 Failed to schedule flush of 8bbd7388262dc8cb1ce2cf4f04a7281dr=null, 
 requester=null
 2011-11-15 05:58:01,358 INFO 

[jira] [Commented] (HBASE-4864) testRegionTransitionOperations occasional failures

2011-11-24 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156710#comment-13156710
 ] 

Hadoop QA commented on HBASE-4864:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12505006/HBASE-4864_Branch92.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 66 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestInstantSchemaChange
  org.apache.hadoop.hbase.client.TestAdmin

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/362//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/362//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/362//console

This message is automatically generated.

 testRegionTransitionOperations occasional failures
 --

 Key: HBASE-4864
 URL: https://issues.apache.org/jira/browse/HBASE-4864
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4864_Branch92.patch


 looks this logs:
 https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
 It seems that we should wait region is added to online region set.
 I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4861) Fix some misspells and extraneous characters in logs; set some to TRACE

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156721#comment-13156721
 ] 

Hudson commented on HBASE-4861:
---

Integrated in HBase-TRUNK-security #8 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/8/])
HBASE-4861 Fix some misspells and extraneous characters in logs; set some 
to TRACE

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/SplitRegionHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java


 Fix some misspells and extraneous characters in logs; set some to TRACE
 ---

 Key: HBASE-4861
 URL: https://issues.apache.org/jira/browse/HBASE-4861
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: 4861.txt


 Some small clean up in logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4865) HBaseAdmin addColumn, modifyColumn, deleteColumn are documented as asynchronous but are actually synchronous.

2011-11-24 Thread nkeywal (Created) (JIRA)
HBaseAdmin addColumn, modifyColumn, deleteColumn are documented as asynchronous 
but are actually synchronous.
-

 Key: HBASE-4865
 URL: https://issues.apache.org/jira/browse/HBASE-4865
 Project: HBase
  Issue Type: Bug
  Components: client, master
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Priority: Minor


The javadoc states is asynchronous, but we can see in the implementation on 
HMaster that the implementation does not use executorService but calls directly 
process(). This is not true for all methods: enableTable, modifyTable, 
disableTable are truly asynchronous.

The other impact is that the listeners are not called, as this is done by the 
executorService.


I don't known if we have to change the documentation or the implementation. For 
consistency; I would change the implementation, but it may breaks existing code.


Two other comments:
1) There is no real naming pattern here, while it would be useful:
HBaseAdmin#createTable is synchrounous and calls the asynchronous 
HMaster#createTable 
HBaseAdmin#createTableAsync is asynchrounous and calls the asynchronous 
HMaster#createTable 
HBaseAdmin#modifyTable is asynchrounous and calls the asynchronous 
HMaster#modifyTable 
HBaseAdmin#modifyColumn is documented as asynchrounous and calls the 
synchronous HMaster#modifyColumn

2) the coprocessor post semantic is not consistent across the services.
- when the service is synchronous, post is called after the services execution 
(ex: addColumn with the current implementation).
- when the service is asynchronous, post is called after the executorService 
has registered the service to execute, but the service itself is not executed 
yet.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-24 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156732#comment-13156732
 ] 

ramkrishna.s.vasudevan commented on HBASE-4855:
---

The batch.installed was getting incremented twice. I will upload the patch 
shortly for review. Test cases result will let you know tomorrow morning as it 
will take time. 

 SplitLogManager hangs on cluster restart. 
 --

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan

 Start a master and RS
 RS goes down (kill -9)
 Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
 there it cannot be processed.
 Restart both master and bring up an RS.
 The master hangs in SplitLogManager.waitforTasks().
 I feel that batch.done is not getting incremented properly.  Not yet digged 
 in fully.
 This may be the reason for occasional failure of 
 TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4864) testRegionTransitionOperations occasional failures

2011-11-24 Thread Ted Yu (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-4864:
-

Assignee: gaojinchao

 testRegionTransitionOperations occasional failures
 --

 Key: HBASE-4864
 URL: https://issues.apache.org/jira/browse/HBASE-4864
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: gaojinchao
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4864_Branch92.patch


 looks this logs:
 https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
 It seems that we should wait region is added to online region set.
 I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4864) testRegionTransitionOperations occasional failures

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4864:
--

  Issue Type: Test  (was: Bug)
Hadoop Flags: Reviewed

 testRegionTransitionOperations occasional failures
 --

 Key: HBASE-4864
 URL: https://issues.apache.org/jira/browse/HBASE-4864
 Project: HBase
  Issue Type: Test
  Components: test
Reporter: gaojinchao
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4864_Branch92.patch


 looks this logs:
 https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
 It seems that we should wait region is added to online region set.
 I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4864) TestMasterObserver#testRegionTransitionOperations occasionally fails

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4864:
--

Summary: TestMasterObserver#testRegionTransitionOperations occasionally 
fails  (was: testRegionTransitionOperations occasional failures)

 TestMasterObserver#testRegionTransitionOperations occasionally fails
 

 Key: HBASE-4864
 URL: https://issues.apache.org/jira/browse/HBASE-4864
 Project: HBase
  Issue Type: Test
  Components: test
Reporter: gaojinchao
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4864_Branch92.patch


 looks this logs:
 https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
 It seems that we should wait region is added to online region set.
 I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4864) TestMasterObserver#testRegionTransitionOperations occasionally fails

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4864:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 TestMasterObserver#testRegionTransitionOperations occasionally fails
 

 Key: HBASE-4864
 URL: https://issues.apache.org/jira/browse/HBASE-4864
 Project: HBase
  Issue Type: Test
  Components: test
Reporter: gaojinchao
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4864_Branch92.patch


 looks this logs:
 https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
 It seems that we should wait region is added to online region set.
 I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4864) TestMasterObserver#testRegionTransitionOperations occasionally fails

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156753#comment-13156753
 ] 

Ted Yu commented on HBASE-4864:
---

Integrated to 0.92 and TRUNK.

Thanks for the patch Jinchao.

 TestMasterObserver#testRegionTransitionOperations occasionally fails
 

 Key: HBASE-4864
 URL: https://issues.apache.org/jira/browse/HBASE-4864
 Project: HBase
  Issue Type: Test
  Components: test
Reporter: gaojinchao
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4864_Branch92.patch


 looks this logs:
 https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
 It seems that we should wait region is added to online region set.
 I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4862) Split hlog and open region concurrently happend may cause data loss

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156765#comment-13156765
 ] 

Ted Yu commented on HBASE-4862:
---

Nice work.
The patch doesn't apply to 0.90 branch:
{code}
Hunk #4 succeeded at 783 (offset -332 lines).
1 out of 4 hunks FAILED -- saving rejects to file 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java.rej
...
patch unexpectedly ends in middle of line
2 out of 2 hunks ignored -- saving rejects to file 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java.rej
{code}
Please rebase your patch for 0.90

A separate patch for TRUNK would be helpful for HadoopQA to run test suite.

Comments about the changes:
getTmpRecoveredEditsFileName() is only used once and there is no javadoc for 
it. Maybe we don't need to create the method, just append .tmp directly to 
the filename.
{code}
+// Convert file name ends with .tmp, so ensure region's 
replayRecoveredEdits
{code}
The beginning of the above should read 'Append filename with '.tmp' to ensure'

 Split hlog and open region concurrently happend may cause data loss
 ---

 Key: HBASE-4862
 URL: https://issues.apache.org/jira/browse/HBASE-4862
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: chunhui shen
 Attachments: 4862.patch


 Case Description:
 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
 and is appending log entry
 2.Regionserver is opening region A now, and in the process 
 replayRecoveredEditsIfAny() ,it will delete the file region 
 A/recoverd.edits/123456 
 3.Split hlog thread catches the io exception, and stop parse this log file 
 and if skipError = true , add it to the corrupt logsHowever, data in 
 other regions in this log file will loss 
 4.Or if skipError = false, it will check filesystem.Of course, the file 
 system is ok , and it only prints a error log, continue assigning regions. 
 Therefore, data in other log files will also loss!!
 The case may happen in the following:
 1.Move region from server A to server B
 2.kill server A and Server B
 3.restart server A and Server B
 We could prevent this exception throuth forbiding deleting  recover.edits 
 file 
 which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Summary: Splitting hlog and opening region concurrently may cause data loss 
 (was: Split hlog and open region concurrently happend may cause data loss)

 Splitting hlog and opening region concurrently may cause data loss
 --

 Key: HBASE-4862
 URL: https://issues.apache.org/jira/browse/HBASE-4862
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4862.patch


 Case Description:
 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
 and is appending log entry
 2.Regionserver is opening region A now, and in the process 
 replayRecoveredEditsIfAny() ,it will delete the file region 
 A/recoverd.edits/123456 
 3.Split hlog thread catches the io exception, and stop parse this log file 
 and if skipError = true , add it to the corrupt logsHowever, data in 
 other regions in this log file will loss 
 4.Or if skipError = false, it will check filesystem.Of course, the file 
 system is ok , and it only prints a error log, continue assigning regions. 
 Therefore, data in other log files will also loss!!
 The case may happen in the following:
 1.Move region from server A to server B
 2.kill server A and Server B
 3.restart server A and Server B
 We could prevent this exception throuth forbiding deleting  recover.edits 
 file 
 which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4862) Split hlog and open region concurrently happend may cause data loss

2011-11-24 Thread Ted Yu (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-4862:
-

Assignee: chunhui shen

 Split hlog and open region concurrently happend may cause data loss
 ---

 Key: HBASE-4862
 URL: https://issues.apache.org/jira/browse/HBASE-4862
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4862.patch


 Case Description:
 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
 and is appending log entry
 2.Regionserver is opening region A now, and in the process 
 replayRecoveredEditsIfAny() ,it will delete the file region 
 A/recoverd.edits/123456 
 3.Split hlog thread catches the io exception, and stop parse this log file 
 and if skipError = true , add it to the corrupt logsHowever, data in 
 other regions in this log file will loss 
 4.Or if skipError = false, it will check filesystem.Of course, the file 
 system is ok , and it only prints a error log, continue assigning regions. 
 Therefore, data in other log files will also loss!!
 The case may happen in the following:
 1.Move region from server A to server B
 2.kill server A and Server B
 3.restart server A and Server B
 We could prevent this exception throuth forbiding deleting  recover.edits 
 file 
 which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Split hlog and open region concurrently happend may cause data loss

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Fix Version/s: 0.90.5
   0.94.0
   0.92.0

 Split hlog and open region concurrently happend may cause data loss
 ---

 Key: HBASE-4862
 URL: https://issues.apache.org/jira/browse/HBASE-4862
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4862.patch


 Case Description:
 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
 and is appending log entry
 2.Regionserver is opening region A now, and in the process 
 replayRecoveredEditsIfAny() ,it will delete the file region 
 A/recoverd.edits/123456 
 3.Split hlog thread catches the io exception, and stop parse this log file 
 and if skipError = true , add it to the corrupt logsHowever, data in 
 other regions in this log file will loss 
 4.Or if skipError = false, it will check filesystem.Of course, the file 
 system is ok , and it only prints a error log, continue assigning regions. 
 Therefore, data in other log files will also loss!!
 The case may happen in the following:
 1.Move region from server A to server B
 2.kill server A and Server B
 3.restart server A and Server B
 We could prevent this exception throuth forbiding deleting  recover.edits 
 file 
 which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4856) Upgrade zookeeper to 3.4.0 release

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156770#comment-13156770
 ] 

Ted Yu commented on HBASE-4856:
---

Integrated to 0.92 and TRUNK after verifying that 3.4.0 artifacts could be 
pulled.

 Upgrade zookeeper to 3.4.0 release
 --

 Key: HBASE-4856
 URL: https://issues.apache.org/jira/browse/HBASE-4856
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.92.0

 Attachments: 4856.txt


 Zookeeper 3.4.0 has been released.
 We should upgade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4865) HBaseAdmin addColumn, modifyColumn, deleteColumn are documented as asynchronous but are actually synchronous.

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156771#comment-13156771
 ] 

Ted Yu commented on HBASE-4865:
---

w.r.t. HBaseAdmin#createTable[Async] methods, see HBASE-3904 and HBASE-3229
We don't need to change their implementation now.

 HBaseAdmin addColumn, modifyColumn, deleteColumn are documented as 
 asynchronous but are actually synchronous.
 -

 Key: HBASE-4865
 URL: https://issues.apache.org/jira/browse/HBASE-4865
 Project: HBase
  Issue Type: Bug
  Components: client, master
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Priority: Minor

 The javadoc states is asynchronous, but we can see in the implementation on 
 HMaster that the implementation does not use executorService but calls 
 directly process(). This is not true for all methods: enableTable, 
 modifyTable, disableTable are truly asynchronous.
 The other impact is that the listeners are not called, as this is done by the 
 executorService.
 I don't known if we have to change the documentation or the implementation. 
 For consistency; I would change the implementation, but it may breaks 
 existing code.
 Two other comments:
 1) There is no real naming pattern here, while it would be useful:
 HBaseAdmin#createTable is synchrounous and calls the asynchronous 
 HMaster#createTable 
 HBaseAdmin#createTableAsync is asynchrounous and calls the asynchronous 
 HMaster#createTable 
 HBaseAdmin#modifyTable is asynchrounous and calls the asynchronous 
 HMaster#modifyTable 
 HBaseAdmin#modifyColumn is documented as asynchrounous and calls the 
 synchronous HMaster#modifyColumn
 2) the coprocessor post semantic is not consistent across the services.
 - when the service is synchronous, post is called after the services 
 execution (ex: addColumn with the current implementation).
 - when the service is asynchronous, post is called after the executorService 
 has registered the service to execute, but the service itself is not executed 
 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4866) Fix possible NPE in AssignmentManager#regionOnline

2011-11-24 Thread Jonathan Hsieh (Created) (JIRA)
Fix possible NPE in AssignmentManager#regionOnline
--

 Key: HBASE-4866
 URL: https://issues.apache.org/jira/browse/HBASE-4866
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Jonathan Hsieh


NPE encountered in users's HMaster logs:

{code}
11/11/22 23:45:37 FATAL master.HMaster: Unhandled exception. Starting shutdown.
java.lang.NullPointerException
   at 
org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731)
   at 
org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:215)
   at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:422)
   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:295)
{code}

From user list: 
http://mail-archives.apache.org/mod_mbox/hbase-user/20.mbox/%3C4ECC9AFC.6030307%40qualtrics.com%3E


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4861) Fix some misspells and extraneous characters in logs; set some to TRACE

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156776#comment-13156776
 ] 

Hudson commented on HBASE-4861:
---

Integrated in HBase-TRUNK #2478 (See 
[https://builds.apache.org/job/HBase-TRUNK/2478/])
HBASE-4861 Fix some misspells and extraneous characters in logs; set some 
to TRACE

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/SplitRegionHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java


 Fix some misspells and extraneous characters in logs; set some to TRACE
 ---

 Key: HBASE-4861
 URL: https://issues.apache.org/jira/browse/HBASE-4861
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: 4861.txt


 Some small clean up in logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4866) Fix possible NPE in AssignmentManager#regionOnline

2011-11-24 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156779#comment-13156779
 ] 

Jonathan Hsieh commented on HBASE-4866:
---


Looks like it corresponds to this line which is AssignmentManager:724 on the 
0.90 branch

{code}
  HServerInfo hsiWithoutLoad = new HServerInfo(
serverInfo.getServerAddress(), serverInfo.getStartCode(),
serverInfo.getInfoPort(), serverInfo.getHostname());
{code}   

 Fix possible NPE in AssignmentManager#regionOnline
 --

 Key: HBASE-4866
 URL: https://issues.apache.org/jira/browse/HBASE-4866
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Jonathan Hsieh

 NPE encountered in users's HMaster logs:
 {code}
 11/11/22 23:45:37 FATAL master.HMaster: Unhandled exception. Starting 
 shutdown.
 java.lang.NullPointerException
at 
 org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731)
at 
 org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:215)
at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:422)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:295)
 {code}
 From user list: 
 http://mail-archives.apache.org/mod_mbox/hbase-user/20.mbox/%3C4ECC9AFC.6030307%40qualtrics.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-24 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4855:
--

Affects Version/s: 0.92.0
Fix Version/s: 0.92.0

 SplitLogManager hangs on cluster restart. 
 --

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0


 Start a master and RS
 RS goes down (kill -9)
 Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
 there it cannot be processed.
 Restart both master and bring up an RS.
 The master hangs in SplitLogManager.waitforTasks().
 I feel that batch.done is not getting incremented properly.  Not yet digged 
 in fully.
 This may be the reason for occasional failure of 
 TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-24 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4855:
--

Fix Version/s: (was: 0.92.0)

 SplitLogManager hangs on cluster restart. 
 --

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan

 Start a master and RS
 RS goes down (kill -9)
 Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
 there it cannot be processed.
 Restart both master and bring up an RS.
 The master hangs in SplitLogManager.waitforTasks().
 I feel that batch.done is not getting incremented properly.  Not yet digged 
 in fully.
 This may be the reason for occasional failure of 
 TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-24 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156820#comment-13156820
 ] 

ramkrishna.s.vasudevan commented on HBASE-4855:
---

When the master restarts and sees splitlog nodes which are not processed the 
SplitLogManager does handleUnassignedTasks
{code}
Task task = findOrCreateOrphanTask(path);
{code}
As part of which 
{code}
task = tasks.putIfAbsent(path, orphanTask);
{code}
Ths task is added.  Later in splitLogDistributed() we try to installTask().

Here we create the task if absent
{code}
Task oldtask = createTaskIfAbsent(path, batch);
{code}
Inside createTaskIfAbsent()
{code}
oldtask = tasks.putIfAbsent(path, new Task(batch));
if (oldtask != null  oldtask.isOrphan()) {
LOG.info(Previously orphan task  + path +
 is now being waited upon);
oldtask.setBatch(batch);
return (null);
}
{code}
the putIfAbsent returns the already added task so oldtask is not null.
Already while doing new Task(batch) 
{code}
   Task(TaskBatch tb) {
  incarnation = 0;
  last_version = -1;
  deleted = false;
  setBatch(tb);
  setUnassigned();
}

public void setBatch(TaskBatch batch) {
  if (batch != null  this.batch != null) {
LOG.fatal(logic error - batch being overwritten);
  }
  this.batch = batch;
  if (batch != null) {
batch.installed++;
  }
}
{code}
the batch.installed++ happens.  Since the oldtask is not null once again we call
oldtask.setBatch(batch) making the batch.installed to increment once again.

This is why batch.done is not able to reach this batch.installed and hence the 
while loop keeps looping.
{code}
while ((batch.done + batch.error) != batch.installed) {
{code}

Pls correct me if my analysis is wrong.  I am uploading a patch which solved 
the problem.  Kindly validate the fix.


 SplitLogManager hangs on cluster restart. 
 --

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan

 Start a master and RS
 RS goes down (kill -9)
 Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
 there it cannot be processed.
 Restart both master and bring up an RS.
 The master hangs in SplitLogManager.waitforTasks().
 I feel that batch.done is not getting incremented properly.  Not yet digged 
 in fully.
 This may be the reason for occasional failure of 
 TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156821#comment-13156821
 ] 

Ted Yu commented on HBASE-4863:
---

I got compilation error:
{code}
testRunThriftServer[0](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine)  
Time elapsed: 2.047 sec   ERROR!
java.lang.Error: Unresolved compilation problem:
  Cannot make a static reference to the non-static method 
getColumnDescriptors() from the type TestThriftServer

  at 
org.apache.hadoop.hbase.thrift.TestThriftServer.createDropTable(TestThriftServer.java:111)
{code}

Since HBaseThreadPoolServer extends TServer, I think a better name for the 
class would be TThreadPoolServer.

 Make HBase Thrift server more configurable and add a command-line UI test
 -

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D531.1.patch, D531.2.patch


 This started as an internal hotfix where we found out that the Thrift server 
 spawned 15000 threads. To bound the thread pool size I added a custom thread 
 pool server implementation called HBaseThreadPoolServer into HBase codebase, 
 and made the following parameters configurable from both command line and as 
 config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
 Under an increasing load, the server creates new threads for every connection 
 before the pool size reaches minWorkerThreads. After that, the server puts 
 new connections into the queue and only creates a new thread when the queue 
 is full. If an attempt to create a new thread fails, the server drops 
 connection. The default TThreadPoolServer would crash in that case, but it 
 never happened because the thread pool was unbounded, so the server would 
 hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
 the client side.
 Another part of this fix is refactoring and unit testing of the command-line 
 part of the Thrift server. The logic there is sufficiently complicated, and 
 the existing ThriftServer class does not test that part at all. The new 
 TestThriftServerCmdLine test starts the Thrift server on a random port with 
 various combinations of options and talks to it through the client API from 
 another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156822#comment-13156822
 ] 

Ted Yu commented on HBASE-4855:
---

The above analysis makes sense.

Nice catch Ramkrishna.

 SplitLogManager hangs on cluster restart. 
 --

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan

 Start a master and RS
 RS goes down (kill -9)
 Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
 there it cannot be processed.
 Restart both master and bring up an RS.
 The master hangs in SplitLogManager.waitforTasks().
 I feel that batch.done is not getting incremented properly.  Not yet digged 
 in fully.
 This may be the reason for occasional failure of 
 TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-24 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4855:
--

Attachment: HBASE-4855.patch

TestDistributedLogSplitting is passing .  Other test cases results will get in 
the morning.

 SplitLogManager hangs on cluster restart. 
 --

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-4855.patch


 Start a master and RS
 RS goes down (kill -9)
 Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
 there it cannot be processed.
 Restart both master and bring up an RS.
 The master hangs in SplitLogManager.waitforTasks().
 I feel that batch.done is not getting incremented properly.  Not yet digged 
 in fully.
 This may be the reason for occasional failure of 
 TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156828#comment-13156828
 ] 

Phabricator commented on HBASE-4863:


tedyu has commented on the revision [jira] [HBASE-4863] Make HBase Thrift 
server more configurable and add a command-line UI test.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:64 
Please add javadoc for the keys.
  These keys should be placed into hbase-default.xml
  src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:80 Is 
TIME_TO_WAIT_AFTER_SHUTDOWN_MS a better name for this constant ?

REVISION DETAIL
  https://reviews.facebook.net/D531


 Make HBase Thrift server more configurable and add a command-line UI test
 -

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D531.1.patch, D531.2.patch


 This started as an internal hotfix where we found out that the Thrift server 
 spawned 15000 threads. To bound the thread pool size I added a custom thread 
 pool server implementation called HBaseThreadPoolServer into HBase codebase, 
 and made the following parameters configurable from both command line and as 
 config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
 Under an increasing load, the server creates new threads for every connection 
 before the pool size reaches minWorkerThreads. After that, the server puts 
 new connections into the queue and only creates a new thread when the queue 
 is full. If an attempt to create a new thread fails, the server drops 
 connection. The default TThreadPoolServer would crash in that case, but it 
 never happened because the thread pool was unbounded, so the server would 
 hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
 the client side.
 Another part of this fix is refactoring and unit testing of the command-line 
 part of the Thrift server. The logic there is sufficiently complicated, and 
 the existing ThriftServer class does not test that part at all. The new 
 TestThriftServerCmdLine test starts the Thrift server on a random port with 
 various combinations of options and talks to it through the client API from 
 another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Ted Yu (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156821#comment-13156821
 ] 

Ted Yu edited comment on HBASE-4863 at 11/24/11 5:14 PM:
-

I got compilation error:
{code}
testRunThriftServer[0](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine)  
Time elapsed: 2.047 sec   ERROR!
java.lang.Error: Unresolved compilation problem:
  Cannot make a static reference to the non-static method 
getColumnDescriptors() from the type TestThriftServer

  at 
org.apache.hadoop.hbase.thrift.TestThriftServer.createDropTable(TestThriftServer.java:111)
{code}

Since HBaseThreadPoolServer extends TServer, I think a better name for the 
class would be TBoundedThreadPoolServer (TThreadPoolServer is in thrift).

  was (Author: yuzhih...@gmail.com):
I got compilation error:
{code}
testRunThriftServer[0](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine)  
Time elapsed: 2.047 sec   ERROR!
java.lang.Error: Unresolved compilation problem:
  Cannot make a static reference to the non-static method 
getColumnDescriptors() from the type TestThriftServer

  at 
org.apache.hadoop.hbase.thrift.TestThriftServer.createDropTable(TestThriftServer.java:111)
{code}

Since HBaseThreadPoolServer extends TServer, I think a better name for the 
class would be TThreadPoolServer.
  
 Make HBase Thrift server more configurable and add a command-line UI test
 -

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D531.1.patch, D531.2.patch


 This started as an internal hotfix where we found out that the Thrift server 
 spawned 15000 threads. To bound the thread pool size I added a custom thread 
 pool server implementation called HBaseThreadPoolServer into HBase codebase, 
 and made the following parameters configurable from both command line and as 
 config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
 Under an increasing load, the server creates new threads for every connection 
 before the pool size reaches minWorkerThreads. After that, the server puts 
 new connections into the queue and only creates a new thread when the queue 
 is full. If an attempt to create a new thread fails, the server drops 
 connection. The default TThreadPoolServer would crash in that case, but it 
 never happened because the thread pool was unbounded, so the server would 
 hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
 the client side.
 Another part of this fix is refactoring and unit testing of the command-line 
 part of the Thrift server. The logic there is sufficiently complicated, and 
 the existing ThriftServer class does not test that part at all. The new 
 TestThriftServerCmdLine test starts the Thrift server on a random port with 
 various combinations of options and talks to it through the client API from 
 another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156830#comment-13156830
 ] 

Ted Yu commented on HBASE-4855:
---

+1 on patch.

 SplitLogManager hangs on cluster restart. 
 --

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-4855.patch


 Start a master and RS
 RS goes down (kill -9)
 Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
 there it cannot be processed.
 Restart both master and bring up an RS.
 The master hangs in SplitLogManager.waitforTasks().
 I feel that batch.done is not getting incremented properly.  Not yet digged 
 in fully.
 This may be the reason for occasional failure of 
 TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4864) TestMasterObserver#testRegionTransitionOperations occasionally fails

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156844#comment-13156844
 ] 

Hudson commented on HBASE-4864:
---

Integrated in HBase-TRUNK #2479 (See 
[https://builds.apache.org/job/HBase-TRUNK/2479/])
HBASE-4864  TestMasterObserver#testRegionTransitionOperations occasionally
   fails (Gao Jinchao)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java


 TestMasterObserver#testRegionTransitionOperations occasionally fails
 

 Key: HBASE-4864
 URL: https://issues.apache.org/jira/browse/HBASE-4864
 Project: HBase
  Issue Type: Test
  Components: test
Reporter: gaojinchao
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4864_Branch92.patch


 looks this logs:
 https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
 It seems that we should wait region is added to online region set.
 I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-24 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156858#comment-13156858
 ] 

Todd Lipcon commented on HBASE-4862:


wait, wait -- _why_ is this happening concurrently? A region should never be 
opened until the split process is done for that region. If this is happening we 
have a much larger issue, which we shouldn't be working around with tmp file 
names, etc.

 Splitting hlog and opening region concurrently may cause data loss
 --

 Key: HBASE-4862
 URL: https://issues.apache.org/jira/browse/HBASE-4862
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4862.patch


 Case Description:
 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
 and is appending log entry
 2.Regionserver is opening region A now, and in the process 
 replayRecoveredEditsIfAny() ,it will delete the file region 
 A/recoverd.edits/123456 
 3.Split hlog thread catches the io exception, and stop parse this log file 
 and if skipError = true , add it to the corrupt logsHowever, data in 
 other regions in this log file will loss 
 4.Or if skipError = false, it will check filesystem.Of course, the file 
 system is ok , and it only prints a error log, continue assigning regions. 
 Therefore, data in other log files will also loss!!
 The case may happen in the following:
 1.Move region from server A to server B
 2.kill server A and Server B
 3.restart server A and Server B
 We could prevent this exception throuth forbiding deleting  recover.edits 
 file 
 which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-4856) Upgrade zookeeper to 3.4.0 release

2011-11-24 Thread Ted Yu (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-4856.
---

Resolution: Fixed

 Upgrade zookeeper to 3.4.0 release
 --

 Key: HBASE-4856
 URL: https://issues.apache.org/jira/browse/HBASE-4856
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.92.0

 Attachments: 4856.txt


 Zookeeper 3.4.0 has been released.
 We should upgade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156866#comment-13156866
 ] 

Ted Yu commented on HBASE-4862:
---

@Chunhui:
Can you attach master and region server log snippets which would show us what 
happened ?

Thanks

 Splitting hlog and opening region concurrently may cause data loss
 --

 Key: HBASE-4862
 URL: https://issues.apache.org/jira/browse/HBASE-4862
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4862.patch


 Case Description:
 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
 and is appending log entry
 2.Regionserver is opening region A now, and in the process 
 replayRecoveredEditsIfAny() ,it will delete the file region 
 A/recoverd.edits/123456 
 3.Split hlog thread catches the io exception, and stop parse this log file 
 and if skipError = true , add it to the corrupt logsHowever, data in 
 other regions in this log file will loss 
 4.Or if skipError = false, it will check filesystem.Of course, the file 
 system is ok , and it only prints a error log, continue assigning regions. 
 Therefore, data in other log files will also loss!!
 The case may happen in the following:
 1.Move region from server A to server B
 2.kill server A and Server B
 3.restart server A and Server B
 We could prevent this exception throuth forbiding deleting  recover.edits 
 file 
 which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-24 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156869#comment-13156869
 ] 

Hadoop QA commented on HBASE-4855:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505020/HBASE-4855.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 66 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestAdmin

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/363//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/363//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/363//console

This message is automatically generated.

 SplitLogManager hangs on cluster restart. 
 --

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-4855.patch


 Start a master and RS
 RS goes down (kill -9)
 Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
 there it cannot be processed.
 Restart both master and bring up an RS.
 The master hangs in SplitLogManager.waitforTasks().
 I feel that batch.done is not getting incremented properly.  Not yet digged 
 in fully.
 This may be the reason for occasional failure of 
 TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4855:
--

Summary: SplitLogManager hangs on cluster restart due to batch.installed 
doubly counted  (was: SplitLogManager hangs on cluster restart. )

 SplitLogManager hangs on cluster restart due to batch.installed doubly counted
 --

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-4855.patch


 Start a master and RS
 RS goes down (kill -9)
 Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
 there it cannot be processed.
 Restart both master and bring up an RS.
 The master hangs in SplitLogManager.waitforTasks().
 I feel that batch.done is not getting incremented properly.  Not yet digged 
 in fully.
 This may be the reason for occasional failure of 
 TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156873#comment-13156873
 ] 

Ted Yu commented on HBASE-4855:
---

Failed test was due to 'Too many open files'

Patch integrated to 0.92 and TRUNK.

Thanks for the patch Ramkrishna.

 SplitLogManager hangs on cluster restart due to batch.installed doubly counted
 --

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-4855.patch


 Start a master and RS
 RS goes down (kill -9)
 Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
 there it cannot be processed.
 Restart both master and bring up an RS.
 The master hangs in SplitLogManager.waitforTasks().
 I feel that batch.done is not getting incremented properly.  Not yet digged 
 in fully.
 This may be the reason for occasional failure of 
 TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156883#comment-13156883
 ] 

Phabricator commented on HBASE-4863:


tedyu has commented on the revision [jira] [HBASE-4863] Make HBase Thrift 
server more configurable and add a command-line UI test.

  Should similar changes in thrift/ThriftServer.java be applied to 
thrift2/ThriftServer.java ?

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:111 
Should this become a parameter user can adjust ?
  src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:263 
Should ttx.getType() be logged ?
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java:179 Should 
read 'Exactly one '

REVISION DETAIL
  https://reviews.facebook.net/D531


 Make HBase Thrift server more configurable and add a command-line UI test
 -

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D531.1.patch, D531.2.patch


 This started as an internal hotfix where we found out that the Thrift server 
 spawned 15000 threads. To bound the thread pool size I added a custom thread 
 pool server implementation called HBaseThreadPoolServer into HBase codebase, 
 and made the following parameters configurable from both command line and as 
 config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
 Under an increasing load, the server creates new threads for every connection 
 before the pool size reaches minWorkerThreads. After that, the server puts 
 new connections into the queue and only creates a new thread when the queue 
 is full. If an attempt to create a new thread fails, the server drops 
 connection. The default TThreadPoolServer would crash in that case, but it 
 never happened because the thread pool was unbounded, so the server would 
 hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
 the client side.
 Another part of this fix is refactoring and unit testing of the command-line 
 part of the Thrift server. The logic there is sufficiently complicated, and 
 the existing ThriftServer class does not test that part at all. The new 
 TestThriftServerCmdLine test starts the Thrift server on a random port with 
 various combinations of options and talks to it through the client API from 
 another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-4863:
---

Attachment: D531.3.patch

mbautin updated the revision [jira] [HBASE-4863] Make HBase Thrift server more 
configurable and add a command-line UI test.
Reviewers: JIRA, Kannan, tedyu, stack

  Addressing Ted's comments. I will re-run unit tests and cluster tests, and 
post an update.

REVISION DETAIL
  https://reviews.facebook.net/D531

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/thrift/TBoundedThreadPoolServer.java
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
  src/main/java/org/apache/hadoop/hbase/util/Threads.java
  src/main/resources/hbase-default.xml
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
  src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServer.java
  src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java
  src/test/java/org/apache/hadoop/hbase/util/TestThreads.java


 Make HBase Thrift server more configurable and add a command-line UI test
 -

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: D531.1.patch, D531.2.patch, D531.3.patch


 This started as an internal hotfix where we found out that the Thrift server 
 spawned 15000 threads. To bound the thread pool size I added a custom thread 
 pool server implementation called HBaseThreadPoolServer into HBase codebase, 
 and made the following parameters configurable from both command line and as 
 config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
 Under an increasing load, the server creates new threads for every connection 
 before the pool size reaches minWorkerThreads. After that, the server puts 
 new connections into the queue and only creates a new thread when the queue 
 is full. If an attempt to create a new thread fails, the server drops 
 connection. The default TThreadPoolServer would crash in that case, but it 
 never happened because the thread pool was unbounded, so the server would 
 hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
 the client side.
 Another part of this fix is refactoring and unit testing of the command-line 
 part of the Thrift server. The logic there is sufficiently complicated, and 
 the existing ThriftServer class does not test that part at all. The new 
 TestThriftServerCmdLine test starts the Thrift server on a random port with 
 various combinations of options and talks to it through the client API from 
 another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Mikhail Bautin (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4863:
--

Status: Patch Available  (was: Open)

 Make HBase Thrift server more configurable and add a command-line UI test
 -

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 
 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, 
 D531.2.patch, D531.3.patch


 This started as an internal hotfix where we found out that the Thrift server 
 spawned 15000 threads. To bound the thread pool size I added a custom thread 
 pool server implementation called HBaseThreadPoolServer into HBase codebase, 
 and made the following parameters configurable from both command line and as 
 config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
 Under an increasing load, the server creates new threads for every connection 
 before the pool size reaches minWorkerThreads. After that, the server puts 
 new connections into the queue and only creates a new thread when the queue 
 is full. If an attempt to create a new thread fails, the server drops 
 connection. The default TThreadPoolServer would crash in that case, but it 
 never happened because the thread pool was unbounded, so the server would 
 hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
 the client side.
 Another part of this fix is refactoring and unit testing of the command-line 
 part of the Thrift server. The logic there is sufficiently complicated, and 
 the existing ThriftServer class does not test that part at all. The new 
 TestThriftServerCmdLine test starts the Thrift server on a random port with 
 various combinations of options and talks to it through the client API from 
 another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Mikhail Bautin (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4863:
--

Attachment: 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch

The same as D531.3.patch but generated using git format-patch --no-prefix 
HEAD^..HEAD so that it can be applied using the normal patch command.

 Make HBase Thrift server more configurable and add a command-line UI test
 -

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 
 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, 
 D531.2.patch, D531.3.patch


 This started as an internal hotfix where we found out that the Thrift server 
 spawned 15000 threads. To bound the thread pool size I added a custom thread 
 pool server implementation called HBaseThreadPoolServer into HBase codebase, 
 and made the following parameters configurable from both command line and as 
 config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
 Under an increasing load, the server creates new threads for every connection 
 before the pool size reaches minWorkerThreads. After that, the server puts 
 new connections into the queue and only creates a new thread when the queue 
 is full. If an attempt to create a new thread fails, the server drops 
 connection. The default TThreadPoolServer would crash in that case, but it 
 never happened because the thread pool was unbounded, so the server would 
 hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
 the client side.
 Another part of this fix is refactoring and unit testing of the command-line 
 part of the Thrift server. The logic there is sufficiently complicated, and 
 the existing ThriftServer class does not test that part at all. The new 
 TestThriftServerCmdLine test starts the Thrift server on a random port with 
 various combinations of options and talks to it through the client API from 
 another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156907#comment-13156907
 ] 

Hudson commented on HBASE-4855:
---

Integrated in HBase-TRUNK #2481 (See 
[https://builds.apache.org/job/HBase-TRUNK/2481/])
HBASE-4855  SplitLogManager hangs on cluster restart due to batch.installed 
doubly counted

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java


 SplitLogManager hangs on cluster restart due to batch.installed doubly counted
 --

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-4855.patch


 Start a master and RS
 RS goes down (kill -9)
 Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
 there it cannot be processed.
 Restart both master and bring up an RS.
 The master hangs in SplitLogManager.waitforTasks().
 I feel that batch.done is not getting incremented properly.  Not yet digged 
 in fully.
 This may be the reason for occasional failure of 
 TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156923#comment-13156923
 ] 

Hadoop QA commented on HBASE-4863:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12505038/0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 67 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.util.TestThreads

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/364//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/364//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/364//console

This message is automatically generated.

 Make HBase Thrift server more configurable and add a command-line UI test
 -

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 
 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, 
 D531.2.patch, D531.3.patch


 This started as an internal hotfix where we found out that the Thrift server 
 spawned 15000 threads. To bound the thread pool size I added a custom thread 
 pool server implementation called HBaseThreadPoolServer into HBase codebase, 
 and made the following parameters configurable from both command line and as 
 config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
 Under an increasing load, the server creates new threads for every connection 
 before the pool size reaches minWorkerThreads. After that, the server puts 
 new connections into the queue and only creates a new thread when the queue 
 is full. If an attempt to create a new thread fails, the server drops 
 connection. The default TThreadPoolServer would crash in that case, but it 
 never happened because the thread pool was unbounded, so the server would 
 hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
 the client side.
 Another part of this fix is refactoring and unit testing of the command-line 
 part of the Thrift server. The logic there is sufficiently complicated, and 
 the existing ThriftServer class does not test that part at all. The new 
 TestThriftServerCmdLine test starts the Thrift server on a random port with 
 various combinations of options and talks to it through the client API from 
 another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4855:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 SplitLogManager hangs on cluster restart due to batch.installed doubly counted
 --

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-4855.patch


 Start a master and RS
 RS goes down (kill -9)
 Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
 there it cannot be processed.
 Restart both master and bring up an RS.
 The master hangs in SplitLogManager.waitforTasks().
 I feel that batch.done is not getting incremented properly.  Not yet digged 
 in fully.
 This may be the reason for occasional failure of 
 TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156956#comment-13156956
 ] 

Ted Yu commented on HBASE-4863:
---

In thrift2/ThriftServer.java:
{code}
  } else {
server = getTThreadPoolServer(protocolFactory, processor, 
transportFactory, inetSocketAddress);
{code}
where
{code}
TThreadPoolServer.Args serverArgs = new 
TThreadPoolServer.Args(serverTransport);
{code}
It would be nice to incorporate TBoundedThreadPoolServer into the above module. 
This can be done in a separate JIRA.

 Make HBase Thrift server more configurable and add a command-line UI test
 -

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 
 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, 
 D531.2.patch, D531.3.patch


 This started as an internal hotfix where we found out that the Thrift server 
 spawned 15000 threads. To bound the thread pool size I added a custom thread 
 pool server implementation called HBaseThreadPoolServer into HBase codebase, 
 and made the following parameters configurable from both command line and as 
 config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
 Under an increasing load, the server creates new threads for every connection 
 before the pool size reaches minWorkerThreads. After that, the server puts 
 new connections into the queue and only creates a new thread when the queue 
 is full. If an attempt to create a new thread fails, the server drops 
 connection. The default TThreadPoolServer would crash in that case, but it 
 never happened because the thread pool was unbounded, so the server would 
 hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
 the client side.
 Another part of this fix is refactoring and unit testing of the command-line 
 part of the Thrift server. The logic there is sufficiently complicated, and 
 the existing ThriftServer class does not test that part at all. The new 
 TestThriftServerCmdLine test starts the Thrift server on a random port with 
 various combinations of options and talks to it through the client API from 
 another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156964#comment-13156964
 ] 

Ted Yu commented on HBASE-4863:
---

{code}
testSleepWithoutInterrupt(org.apache.hadoop.hbase.util.TestThreads)  Time 
elapsed: 5.004 sec   FAILURE!
java.lang.AssertionError
  at org.junit.Assert.fail(Assert.java:92)
  at org.junit.Assert.assertTrue(Assert.java:43)
  at org.junit.Assert.assertTrue(Assert.java:54)
  at 
org.apache.hadoop.hbase.util.TestThreads.testSleepWithoutInterrupt(TestThreads.java:57)
{code}
points to this line:
{code}
  assertTrue(sleeper.isInterrupted());
{code}

 Make HBase Thrift server more configurable and add a command-line UI test
 -

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Attachments: 
 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, 
 D531.2.patch, D531.3.patch


 This started as an internal hotfix where we found out that the Thrift server 
 spawned 15000 threads. To bound the thread pool size I added a custom thread 
 pool server implementation called HBaseThreadPoolServer into HBase codebase, 
 and made the following parameters configurable from both command line and as 
 config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
 Under an increasing load, the server creates new threads for every connection 
 before the pool size reaches minWorkerThreads. After that, the server puts 
 new connections into the queue and only creates a new thread when the queue 
 is full. If an attempt to create a new thread fails, the server drops 
 connection. The default TThreadPoolServer would crash in that case, but it 
 never happened because the thread pool was unbounded, so the server would 
 hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
 the client side.
 Another part of this fix is refactoring and unit testing of the command-line 
 part of the Thrift server. The logic there is sufficiently complicated, and 
 the existing ThriftServer class does not test that part at all. The new 
 TestThriftServerCmdLine test starts the Thrift server on a random port with 
 various combinations of options and talks to it through the client API from 
 another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-24 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156965#comment-13156965
 ] 

chunhui shen commented on HBASE-4862:
-

@Ted Yu @Todd Lipcon

It will happen concurrently in the following case:
1.Move region from server A to server B (for example,do balance)
2.kill server A and Server B
3.restart server A and Server B immediately

Before we restart server A and Server B, log data about this region appear in 
the both server's log file,
4.After we restart server B, serverShutdownHandler process this dead server , 
and assign this region,
5.At the same time, serverShutdownHandler would process dead server B, and 
split server B's hlog
because 4 and 5 is concurrent, replayRecoveredEditsIfAny in 4 and appending log 
entry for this region's
recoverd.edit file are concurrent. So, when the recoverd.edit file deleted by 
replayRecoveredEdits, exception is thrown.

master and region server log in this case as the following:

master log: 
2011-11-16 11:50:13,037 FATAL 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-1 Got while 
writing log entry to log 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156791680
 File does not exist. [Lease. Holder: 
DFSClient_hb_m_dw75.kgb.sqa.cm4:6_1321413286871, pendingcreates: 54] 
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542)
 
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533)
 
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449)
 
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649) 
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 
at java.lang.reflect.Method.invoke(Method.java:597) 
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557) 
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415) 
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411) 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:396) 
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
 
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409) 

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) 
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 
at java.lang.reflect.Constructor.newInstance(Constructor.java:513) 
at 
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96)
 
at 
org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:49)
 
at 
org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
 
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.writeBuffer(HLogSplitter.java:962)
 
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.doRun(HLogSplitter.java:926)
 
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.run(HLogSplitter.java:898)
 



regionserver log: 
2011-11-16 11:49:49,727 ERROR org.apache.hadoop.hbase.regionserver.HRegion: 
Failed delete of 
hdfs://dw74.kgb.sqa.cm4:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156791680
2011-11-16 11:49:49,732 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Deleted recovered.edits 
file=hdfs://dw74.kgb.sqa.cm4:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156800103

 Splitting hlog and opening region concurrently may cause data loss
 --

 Key: HBASE-4862
 URL: https://issues.apache.org/jira/browse/HBASE-4862
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4862.patch


 Case Description:
 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
 and is appending log entry
 2.Regionserver is opening region A now, and in the process 
 replayRecoveredEditsIfAny() ,it will delete the file region 
 A/recoverd.edits/123456 
 3.Split 

[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-24 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156967#comment-13156967
 ] 

chunhui shen commented on HBASE-4862:
-

After successfully move region from server A to server B,
the log about this region in server A's log file is successful because flushed 
already,
but it affects other regions'log data in server A's log file if encounter this 
exception when split hlog

 Splitting hlog and opening region concurrently may cause data loss
 --

 Key: HBASE-4862
 URL: https://issues.apache.org/jira/browse/HBASE-4862
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4862.patch


 Case Description:
 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
 and is appending log entry
 2.Regionserver is opening region A now, and in the process 
 replayRecoveredEditsIfAny() ,it will delete the file region 
 A/recoverd.edits/123456 
 3.Split hlog thread catches the io exception, and stop parse this log file 
 and if skipError = true , add it to the corrupt logsHowever, data in 
 other regions in this log file will loss 
 4.Or if skipError = false, it will check filesystem.Of course, the file 
 system is ok , and it only prints a error log, continue assigning regions. 
 Therefore, data in other log files will also loss!!
 The case may happen in the following:
 1.Move region from server A to server B
 2.kill server A and Server B
 3.restart server A and Server B
 We could prevent this exception throuth forbiding deleting  recover.edits 
 file 
 which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4867) A tool to merge configuration files

2011-11-24 Thread Mikhail Bautin (Created) (JIRA)
A tool to merge configuration files
---

 Key: HBASE-4867
 URL: https://issues.apache.org/jira/browse/HBASE-4867
 Project: HBase
  Issue Type: New Feature
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor


With our cluster configuration setup it would be good to have a tool that would 
merge HBase configuration, so that files appearing later in the list would 
override properties specified in earlier files. This way we could merge 
application-specific configuration file with the cluster-specific configuration 
file (with the latter overriding the former) and produce a single HBase 
configuration file to install on the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted

2011-11-24 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4855:
--

Fix Version/s: 0.92.0

Thanks Ted for your review and committing the patch.
Updating fix versions as 0.92.

 SplitLogManager hangs on cluster restart due to batch.installed doubly counted
 --

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4855.patch


 Start a master and RS
 RS goes down (kill -9)
 Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
 there it cannot be processed.
 Restart both master and bring up an RS.
 The master hangs in SplitLogManager.waitforTasks().
 I feel that batch.done is not getting incremented properly.  Not yet digged 
 in fully.
 This may be the reason for occasional failure of 
 TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-24 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13156991#comment-13156991
 ] 

chunhui shen commented on HBASE-4862:
-

@Ted @Todd

I'm sorry my explanation is not clear.
I think I should descibe the detailed case first.

In the whole following process , client's putting data to region C.
1.Sucessfully move region C from server A to server B,
At the moment,there is log entry about region C in both server A's log file and 
server B's log file

2.kill server A and server B,

3.restart server B,
Now, mastet start serverShutdownHanlder for server B, and assign the region C 
to server D

4,Before region C is opend on the server D,restart server A
Now,mastet start serverShutdownHanlder for server A, and split server A's log 
file.
Because there is log entry about region C in server A's log file (why? see 1), 
split hlog thread would create a file F in the region C's recovered.edits 
directory.

5.In region C opening process, it will execute replayRecoveredEdits(),and then 
delete file F.

6.Therefore,in the 4, it throws IO Exception that file F not exists, and cause 
stopping parse the current  server A's hlog file, however, other data in this 
server A's hlog file lossed

The posted region server log is server B's log, and it is doing 
replayRecoveredEditsIfAny(). Although it prints failed delete of  file 
recovered.edits/13156791680, but  in fact this file has been deleted, 
and master throws file not exist exception :
2011-11-16 11:50:13,037 FATAL 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-1 Got while 
writing log entry to log 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156791680
 File does not exist.
 
I'm not sure whether you are clear now, waiting for your question.

Thanks!



 Splitting hlog and opening region concurrently may cause data loss
 --

 Key: HBASE-4862
 URL: https://issues.apache.org/jira/browse/HBASE-4862
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4862.patch


 Case Description:
 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
 and is appending log entry
 2.Regionserver is opening region A now, and in the process 
 replayRecoveredEditsIfAny() ,it will delete the file region 
 A/recoverd.edits/123456 
 3.Split hlog thread catches the io exception, and stop parse this log file 
 and if skipError = true , add it to the corrupt logsHowever, data in 
 other regions in this log file will loss 
 4.Or if skipError = false, it will check filesystem.Of course, the file 
 system is ok , and it only prints a error log, continue assigning regions. 
 Therefore, data in other log files will also loss!!
 The case may happen in the following:
 1.Move region from server A to server B
 2.kill server A and Server B
 3.restart server A and Server B
 We could prevent this exception throuth forbiding deleting  recover.edits 
 file 
 which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-24 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-4862:


Attachment: hbase-4862v1 for 0.90.diff

 Splitting hlog and opening region concurrently may cause data loss
 --

 Key: HBASE-4862
 URL: https://issues.apache.org/jira/browse/HBASE-4862
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4862.patch, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 
 trunk.diff


 Case Description:
 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
 and is appending log entry
 2.Regionserver is opening region A now, and in the process 
 replayRecoveredEditsIfAny() ,it will delete the file region 
 A/recoverd.edits/123456 
 3.Split hlog thread catches the io exception, and stop parse this log file 
 and if skipError = true , add it to the corrupt logsHowever, data in 
 other regions in this log file will loss 
 4.Or if skipError = false, it will check filesystem.Of course, the file 
 system is ok , and it only prints a error log, continue assigning regions. 
 Therefore, data in other log files will also loss!!
 The case may happen in the following:
 1.Move region from server A to server B
 2.kill server A and Server B
 3.restart server A and Server B
 We could prevent this exception throuth forbiding deleting  recover.edits 
 file 
 which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-24 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-4862:


Attachment: hbase-4862v1 for trunk.diff

 Splitting hlog and opening region concurrently may cause data loss
 --

 Key: HBASE-4862
 URL: https://issues.apache.org/jira/browse/HBASE-4862
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.92.0, 0.94.0, 0.90.5

 Attachments: 4862.patch, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 
 trunk.diff


 Case Description:
 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
 and is appending log entry
 2.Regionserver is opening region A now, and in the process 
 replayRecoveredEditsIfAny() ,it will delete the file region 
 A/recoverd.edits/123456 
 3.Split hlog thread catches the io exception, and stop parse this log file 
 and if skipError = true , add it to the corrupt logsHowever, data in 
 other regions in this log file will loss 
 4.Or if skipError = false, it will check filesystem.Of course, the file 
 system is ok , and it only prints a error log, continue assigning regions. 
 Therefore, data in other log files will also loss!!
 The case may happen in the following:
 1.Move region from server A to server B
 2.kill server A and Server B
 3.restart server A and Server B
 We could prevent this exception throuth forbiding deleting  recover.edits 
 file 
 which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92

2011-11-24 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13157000#comment-13157000
 ] 

Lars Hofhansl commented on HBASE-4838:
--

I pinpointed the difference to the compactions of the daughters (again with 
just 2 keys):

in 0.92 (with this patch) I see this for the 1st daughter region (which is 
compacted last):

{noformat}
2011-11-24 22:08:51,324 INFO  
[RegionServer:2;localhost,42385,1322201325234-smallCompactions-1322201331230] 
regionserver.HRegion(1012): Starting compaction on testFamily in region 
testFilterAcrossMutlipleRegions,,1322201330936.0db66f8aabdf138dbbcf6c04f857c284.
2011-11-24 22:08:51,332 INFO  
[RegionServer:2;localhost,42385,1322201325234-smallCompactions-1322201331230] 
regionserver.Store(725): Starting compaction of 1 file(s) in testFamily of 
testFilterAcrossMutlipleRegions,,1322201330936.0db66f8aabdf138dbbcf6c04f857c284.
 into 
tmpdir=hdfs://localhost:52206/user/lars/testFilterAcrossMutlipleRegions/0db66f8aabdf138dbbcf6c04f857c284/.tmp,
 seqid=3, totalSize=662.0
2011-11-24 22:08:51,333 DEBUG 
[RegionServer:2;localhost,42385,1322201325234-smallCompactions-1322201331230] 
regionserver.Store(1174): Compacting 
hdfs://localhost:52206/user/lars/testFilterAcrossMutlipleRegions/0db66f8aabdf138dbbcf6c04f857c284/testFamily/85a0a11b15a248c69e09e44e0e9e052e.4e293f99103a49243c16eb104996554b-hdfs://localhost:52206/user/lars/testFilterAcrossMutlipleRegions/4e293f99103a49243c16eb104996554b/testFamily/85a0a11b15a248c69e09e44e0e9e052e-bottom,
 keycount=2, bloomtype=NONE, size=662.0
2011-11-24 22:08:51,388 INFO  
[RegionServer:2;localhost,42385,1322201325234-smallCompactions-1322201331230] 
regionserver.Store(1322): Renaming compacted file at 
hdfs://localhost:52206/user/lars/testFilterAcrossMutlipleRegions/0db66f8aabdf138dbbcf6c04f857c284/.tmp/7e7f4acb121e4696bd3c7d64e26a66b9
 to 
hdfs://localhost:52206/user/lars/testFilterAcrossMutlipleRegions/0db66f8aabdf138dbbcf6c04f857c284/testFamily/7e7f4acb121e4696bd3c7d64e26a66b9
2011-11-24 22:08:51,402 INFO  
[RegionServer:2;localhost,42385,1322201325234-smallCompactions-1322201331230] 
regionserver.Store(746): Completed major compaction of 1 file(s) in testFamily 
of 
testFilterAcrossMutlipleRegions,,1322201330936.0db66f8aabdf138dbbcf6c04f857c284.
 into 7e7f4acb121e4696bd3c7d64e26a66b9, size=662.0; total size for store is 
662.0
{noformat}

in trunk I see this for the 1st daughter region:

{noformat}
2011-11-24 22:15:18,205 INFO  
[RegionServer:0;localhost,46427,1322201712357-smallCompactions-1322201718071] 
regionserver.HRegion(1097): Starting compaction on testFamily in region 
testFilterAcrossMutlipleRegions,,1322201717807.2bdeac6934712efdd694ec44ae48d1b2.
2011-11-24 22:15:18,206 INFO  
[RegionServer:0;localhost,46427,1322201712357-smallCompactions-1322201718071] 
regionserver.Store(797): Starting compaction of 1 file(s) in testFamily of 
testFilterAcrossMutlipleRegions,,1322201717807.2bdeac6934712efdd694ec44ae48d1b2.
 into 
tmpdir=hdfs://localhost:37213/user/lars/testFilterAcrossMutlipleRegions/2bdeac6934712efdd694ec44ae48d1b2/.tmp,
 seqid=3, totalSize=718.0
2011-11-24 22:15:18,206 DEBUG 
[RegionServer:0;localhost,46427,1322201712357-smallCompactions-1322201718071] 
regionserver.Store(1255): Compacting 
hdfs://localhost:37213/user/lars/testFilterAcrossMutlipleRegions/2bdeac6934712efdd694ec44ae48d1b2/testFamily/64908313825b4c0599b86c26b33797e3.215be88f57f1ca63b6ead035b39c4d2e-hdfs://localhost:37213/user/lars/testFilterAcrossMutlipleRegions/215be88f57f1ca63b6ead035b39c4d2e/testFamily/64908313825b4c0599b86c26b33797e3-bottom,
 keycount=2, bloomtype=NONE, size=718.0
2011-11-24 22:15:18,211 INFO  
[RegionServer:0;localhost,46427,1322201712357-smallCompactions-1322201718071] 
regionserver.Store(818): Completed major compaction of 1 file(s) in testFamily 
of 
testFilterAcrossMutlipleRegions,,1322201717807.2bdeac6934712efdd694ec44ae48d1b2.
 into none, size=none; total size for store is 0.0
{noformat}

The keys in both cases are aaa and aab and the split key is aaa, so the 1st 
region (''-'aaa') should indeed be empty after compaction. In trunk it is 
correctly compacted to an empty file.
In 0.92 it somehow wrote out the entire file again (so the keys are found in 
the store files for both regions).


 Port 2856 (TestAcidGuarantee is failing) to 0.92
 

 Key: HBASE-4838
 URL: https://issues.apache.org/jira/browse/HBASE-4838
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 4838-v1.txt


 Moving back port into a separate issue (as suggested by JonH), because this 
 not trivial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13157008#comment-13157008
 ] 

Hudson commented on HBASE-4855:
---

Integrated in HBase-0.92-security #13 (See 
[https://builds.apache.org/job/HBase-0.92-security/13/])
HBASE-4855  SplitLogManager hangs on cluster restart due to batch.installed 
doubly counted

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java


 SplitLogManager hangs on cluster restart due to batch.installed doubly counted
 --

 Key: HBASE-4855
 URL: https://issues.apache.org/jira/browse/HBASE-4855
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: HBASE-4855.patch


 Start a master and RS
 RS goes down (kill -9)
 Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
 there it cannot be processed.
 Restart both master and bring up an RS.
 The master hangs in SplitLogManager.waitforTasks().
 I feel that batch.done is not getting incremented properly.  Not yet digged 
 in fully.
 This may be the reason for occasional failure of 
 TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4856) Upgrade zookeeper to 3.4.0 release

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13157007#comment-13157007
 ] 

Hudson commented on HBASE-4856:
---

Integrated in HBase-0.92-security #13 (See 
[https://builds.apache.org/job/HBase-0.92-security/13/])
HBASE-4856  Upgrade zookeeper to 3.4.0 release

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/pom.xml


 Upgrade zookeeper to 3.4.0 release
 --

 Key: HBASE-4856
 URL: https://issues.apache.org/jira/browse/HBASE-4856
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.92.0

 Attachments: 4856.txt


 Zookeeper 3.4.0 has been released.
 We should upgade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4864) TestMasterObserver#testRegionTransitionOperations occasionally fails

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13157005#comment-13157005
 ] 

Hudson commented on HBASE-4864:
---

Integrated in HBase-0.92-security #13 (See 
[https://builds.apache.org/job/HBase-0.92-security/13/])
HBASE-4864  TestMasterObserver#testRegionTransitionOperations occasionally
   fails (Gao Jinchao)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java


 TestMasterObserver#testRegionTransitionOperations occasionally fails
 

 Key: HBASE-4864
 URL: https://issues.apache.org/jira/browse/HBASE-4864
 Project: HBase
  Issue Type: Test
  Components: test
Reporter: gaojinchao
Assignee: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0

 Attachments: HBASE-4864_Branch92.patch


 looks this logs:
 https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
 It seems that we should wait region is added to online region set.
 I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4861) Fix some misspells and extraneous characters in logs; set some to TRACE

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13157006#comment-13157006
 ] 

Hudson commented on HBASE-4861:
---

Integrated in HBase-0.92-security #13 (See 
[https://builds.apache.org/job/HBase-0.92-security/13/])
HBASE-4861 Fix some misspells and extraneous characters in logs; set some 
to TRACE

stack : 
Files : 
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/SplitRegionHandler.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java


 Fix some misspells and extraneous characters in logs; set some to TRACE
 ---

 Key: HBASE-4861
 URL: https://issues.apache.org/jira/browse/HBASE-4861
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: 4861.txt


 Some small clean up in logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira