[jira] [Commented] (HBASE-4861) Fix some misspells and extraneous characters in logs; set some to TRACE

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157006#comment-13157006
 ] 

Hudson commented on HBASE-4861:
---

Integrated in HBase-0.92-security #13 (See 
[https://builds.apache.org/job/HBase-0.92-security/13/])
HBASE-4861 Fix some misspells and extraneous characters in logs; set some 
to TRACE

stack : 
Files : 
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/SplitRegionHandler.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java


> Fix some misspells and extraneous characters in logs; set some to TRACE
> ---
>
> Key: HBASE-4861
> URL: https://issues.apache.org/jira/browse/HBASE-4861
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
> Fix For: 0.92.0
>
> Attachments: 4861.txt
>
>
> Some small clean up in logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157008#comment-13157008
 ] 

Hudson commented on HBASE-4855:
---

Integrated in HBase-0.92-security #13 (See 
[https://builds.apache.org/job/HBase-0.92-security/13/])
HBASE-4855  SplitLogManager hangs on cluster restart due to batch.installed 
doubly counted

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java


> SplitLogManager hangs on cluster restart due to batch.installed doubly counted
> --
>
> Key: HBASE-4855
> URL: https://issues.apache.org/jira/browse/HBASE-4855
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.0
>
> Attachments: HBASE-4855.patch
>
>
> Start a master and RS
> RS goes down (kill -9)
> Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
> there it cannot be processed.
> Restart both master and bring up an RS.
> The master hangs in SplitLogManager.waitforTasks().
> I feel that batch.done is not getting incremented properly.  Not yet digged 
> in fully.
> This may be the reason for occasional failure of 
> TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4856) Upgrade zookeeper to 3.4.0 release

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157007#comment-13157007
 ] 

Hudson commented on HBASE-4856:
---

Integrated in HBase-0.92-security #13 (See 
[https://builds.apache.org/job/HBase-0.92-security/13/])
HBASE-4856  Upgrade zookeeper to 3.4.0 release

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/pom.xml


> Upgrade zookeeper to 3.4.0 release
> --
>
> Key: HBASE-4856
> URL: https://issues.apache.org/jira/browse/HBASE-4856
> Project: HBase
>  Issue Type: Task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 4856.txt
>
>
> Zookeeper 3.4.0 has been released.
> We should upgade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4864) TestMasterObserver#testRegionTransitionOperations occasionally fails

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157005#comment-13157005
 ] 

Hudson commented on HBASE-4864:
---

Integrated in HBase-0.92-security #13 (See 
[https://builds.apache.org/job/HBase-0.92-security/13/])
HBASE-4864  TestMasterObserver#testRegionTransitionOperations occasionally
   fails (Gao Jinchao)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java


> TestMasterObserver#testRegionTransitionOperations occasionally fails
> 
>
> Key: HBASE-4864
> URL: https://issues.apache.org/jira/browse/HBASE-4864
> Project: HBase
>  Issue Type: Test
>  Components: test
>Reporter: gaojinchao
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4864_Branch92.patch
>
>
> looks this logs:
> https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
> It seems that we should wait region is added to online region set.
> I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92

2011-11-24 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157000#comment-13157000
 ] 

Lars Hofhansl commented on HBASE-4838:
--

I pinpointed the difference to the compactions of the daughters (again with 
just 2 keys):

in 0.92 (with this patch) I see this for the 1st daughter region (which is 
compacted last):

{noformat}
2011-11-24 22:08:51,324 INFO  
[RegionServer:2;localhost,42385,1322201325234-smallCompactions-1322201331230] 
regionserver.HRegion(1012): Starting compaction on testFamily in region 
testFilterAcrossMutlipleRegions,,1322201330936.0db66f8aabdf138dbbcf6c04f857c284.
2011-11-24 22:08:51,332 INFO  
[RegionServer:2;localhost,42385,1322201325234-smallCompactions-1322201331230] 
regionserver.Store(725): Starting compaction of 1 file(s) in testFamily of 
testFilterAcrossMutlipleRegions,,1322201330936.0db66f8aabdf138dbbcf6c04f857c284.
 into 
tmpdir=hdfs://localhost:52206/user/lars/testFilterAcrossMutlipleRegions/0db66f8aabdf138dbbcf6c04f857c284/.tmp,
 seqid=3, totalSize=662.0
2011-11-24 22:08:51,333 DEBUG 
[RegionServer:2;localhost,42385,1322201325234-smallCompactions-1322201331230] 
regionserver.Store(1174): Compacting 
hdfs://localhost:52206/user/lars/testFilterAcrossMutlipleRegions/0db66f8aabdf138dbbcf6c04f857c284/testFamily/85a0a11b15a248c69e09e44e0e9e052e.4e293f99103a49243c16eb104996554b-hdfs://localhost:52206/user/lars/testFilterAcrossMutlipleRegions/4e293f99103a49243c16eb104996554b/testFamily/85a0a11b15a248c69e09e44e0e9e052e-bottom,
 keycount=2, bloomtype=NONE, size=662.0
2011-11-24 22:08:51,388 INFO  
[RegionServer:2;localhost,42385,1322201325234-smallCompactions-1322201331230] 
regionserver.Store(1322): Renaming compacted file at 
hdfs://localhost:52206/user/lars/testFilterAcrossMutlipleRegions/0db66f8aabdf138dbbcf6c04f857c284/.tmp/7e7f4acb121e4696bd3c7d64e26a66b9
 to 
hdfs://localhost:52206/user/lars/testFilterAcrossMutlipleRegions/0db66f8aabdf138dbbcf6c04f857c284/testFamily/7e7f4acb121e4696bd3c7d64e26a66b9
2011-11-24 22:08:51,402 INFO  
[RegionServer:2;localhost,42385,1322201325234-smallCompactions-1322201331230] 
regionserver.Store(746): Completed major compaction of 1 file(s) in testFamily 
of 
testFilterAcrossMutlipleRegions,,1322201330936.0db66f8aabdf138dbbcf6c04f857c284.
 into 7e7f4acb121e4696bd3c7d64e26a66b9, size=662.0; total size for store is 
662.0
{noformat}

in trunk I see this for the 1st daughter region:

{noformat}
2011-11-24 22:15:18,205 INFO  
[RegionServer:0;localhost,46427,1322201712357-smallCompactions-1322201718071] 
regionserver.HRegion(1097): Starting compaction on testFamily in region 
testFilterAcrossMutlipleRegions,,1322201717807.2bdeac6934712efdd694ec44ae48d1b2.
2011-11-24 22:15:18,206 INFO  
[RegionServer:0;localhost,46427,1322201712357-smallCompactions-1322201718071] 
regionserver.Store(797): Starting compaction of 1 file(s) in testFamily of 
testFilterAcrossMutlipleRegions,,1322201717807.2bdeac6934712efdd694ec44ae48d1b2.
 into 
tmpdir=hdfs://localhost:37213/user/lars/testFilterAcrossMutlipleRegions/2bdeac6934712efdd694ec44ae48d1b2/.tmp,
 seqid=3, totalSize=718.0
2011-11-24 22:15:18,206 DEBUG 
[RegionServer:0;localhost,46427,1322201712357-smallCompactions-1322201718071] 
regionserver.Store(1255): Compacting 
hdfs://localhost:37213/user/lars/testFilterAcrossMutlipleRegions/2bdeac6934712efdd694ec44ae48d1b2/testFamily/64908313825b4c0599b86c26b33797e3.215be88f57f1ca63b6ead035b39c4d2e-hdfs://localhost:37213/user/lars/testFilterAcrossMutlipleRegions/215be88f57f1ca63b6ead035b39c4d2e/testFamily/64908313825b4c0599b86c26b33797e3-bottom,
 keycount=2, bloomtype=NONE, size=718.0
2011-11-24 22:15:18,211 INFO  
[RegionServer:0;localhost,46427,1322201712357-smallCompactions-1322201718071] 
regionserver.Store(818): Completed major compaction of 1 file(s) in testFamily 
of 
testFilterAcrossMutlipleRegions,,1322201717807.2bdeac6934712efdd694ec44ae48d1b2.
 into none, size=none; total size for store is 0.0
{noformat}

The keys in both cases are aaa and aab and the split key is aaa, so the 1st 
region (''-'aaa') should indeed be empty after compaction. In trunk it is 
correctly compacted to an empty file.
In 0.92 it somehow wrote out the entire file again (so the keys are found in 
the store files for both regions).


> Port 2856 (TestAcidGuarantee is failing) to 0.92
> 
>
> Key: HBASE-4838
> URL: https://issues.apache.org/jira/browse/HBASE-4838
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.92.0
>
> Attachments: 4838-v1.txt
>
>
> Moving back port into a separate issue (as suggested by JonH), because this 
> not trivial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administr

[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-24 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-4862:


Attachment: hbase-4862v1 for trunk.diff

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 
> trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-24 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-4862:


Attachment: hbase-4862v1 for 0.90.diff

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 
> trunk.diff
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-24 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156991#comment-13156991
 ] 

chunhui shen commented on HBASE-4862:
-

@Ted @Todd

I'm sorry my explanation is not clear.
I think I should descibe the detailed case first.

In the whole following process , client's putting data to region C.
1.Sucessfully move region C from server A to server B,
At the moment,there is log entry about region C in both server A's log file and 
server B's log file

2.kill server A and server B,

3.restart server B,
Now, mastet start serverShutdownHanlder for server B, and assign the region C 
to server D

4,Before region C is opend on the server D,restart server A
Now,mastet start serverShutdownHanlder for server A, and split server A's log 
file.
Because there is log entry about region C in server A's log file (why? see 1), 
split hlog thread would create a file F in the region C's recovered.edits 
directory.

5.In region C opening process, it will execute replayRecoveredEdits(),and then 
delete file F.

6.Therefore,in the 4, it throws IO Exception that file F not exists, and cause 
stopping parse the current  server A's hlog file, however, other data in this 
server A's hlog file lossed

The posted region server log is server B's log, and it is doing 
replayRecoveredEditsIfAny(). Although it prints failed delete of  file 
recovered.edits/13156791680, but  in fact this file has been deleted, 
and master throws file not exist exception :
2011-11-16 11:50:13,037 FATAL 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-1 Got while 
writing log entry to log 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156791680
 File does not exist.
 
I'm not sure whether you are clear now, waiting for your question.

Thanks!



> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted

2011-11-24 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4855:
--

Fix Version/s: 0.92.0

Thanks Ted for your review and committing the patch.
Updating fix versions as 0.92.

> SplitLogManager hangs on cluster restart due to batch.installed doubly counted
> --
>
> Key: HBASE-4855
> URL: https://issues.apache.org/jira/browse/HBASE-4855
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.0
>
> Attachments: HBASE-4855.patch
>
>
> Start a master and RS
> RS goes down (kill -9)
> Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
> there it cannot be processed.
> Restart both master and bring up an RS.
> The master hangs in SplitLogManager.waitforTasks().
> I feel that batch.done is not getting incremented properly.  Not yet digged 
> in fully.
> This may be the reason for occasional failure of 
> TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4867) A tool to merge configuration files

2011-11-24 Thread Mikhail Bautin (Created) (JIRA)
A tool to merge configuration files
---

 Key: HBASE-4867
 URL: https://issues.apache.org/jira/browse/HBASE-4867
 Project: HBase
  Issue Type: New Feature
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor


With our cluster configuration setup it would be good to have a tool that would 
merge HBase configuration, so that files appearing later in the list would 
override properties specified in earlier files. This way we could merge 
application-specific configuration file with the cluster-specific configuration 
file (with the latter overriding the former) and produce a single HBase 
configuration file to install on the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-24 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156967#comment-13156967
 ] 

chunhui shen commented on HBASE-4862:
-

After successfully move region from server A to server B,
the log about this region in server A's log file is successful because flushed 
already,
but it affects other regions'log data in server A's log file if encounter this 
exception when split hlog

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-24 Thread chunhui shen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156965#comment-13156965
 ] 

chunhui shen commented on HBASE-4862:
-

@Ted Yu @Todd Lipcon

It will happen concurrently in the following case:
1.Move region from server A to server B (for example,do balance)
2.kill server A and Server B
3.restart server A and Server B immediately

Before we restart server A and Server B, log data about this region appear in 
the both server's log file,
4.After we restart server B, serverShutdownHandler process this dead server , 
and assign this region,
5.At the same time, serverShutdownHandler would process dead server B, and 
split server B's hlog
because 4 and 5 is concurrent, replayRecoveredEditsIfAny in 4 and appending log 
entry for this region's
recoverd.edit file are concurrent. So, when the recoverd.edit file deleted by 
replayRecoveredEdits, exception is thrown.

master and region server log in this case as the following:

master log: 
2011-11-16 11:50:13,037 FATAL 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-1 Got while 
writing log entry to log 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: 
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156791680
 File does not exist. [Lease. Holder: 
DFSClient_hb_m_dw75.kgb.sqa.cm4:6_1321413286871, pendingcreates: 54] 
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542)
 
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533)
 
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449)
 
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649) 
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 
at java.lang.reflect.Method.invoke(Method.java:597) 
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557) 
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415) 
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411) 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:396) 
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
 
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409) 

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) 
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 
at java.lang.reflect.Constructor.newInstance(Constructor.java:513) 
at 
org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96)
 
at 
org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:49)
 
at 
org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
 
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.writeBuffer(HLogSplitter.java:962)
 
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.doRun(HLogSplitter.java:926)
 
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.run(HLogSplitter.java:898)
 



regionserver log: 
2011-11-16 11:49:49,727 ERROR org.apache.hadoop.hbase.regionserver.HRegion: 
Failed delete of 
hdfs://dw74.kgb.sqa.cm4:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156791680
2011-11-16 11:49:49,732 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Deleted recovered.edits 
file=hdfs://dw74.kgb.sqa.cm4:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156800103

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edi

[jira] [Commented] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156964#comment-13156964
 ] 

Ted Yu commented on HBASE-4863:
---

{code}
testSleepWithoutInterrupt(org.apache.hadoop.hbase.util.TestThreads)  Time 
elapsed: 5.004 sec  <<< FAILURE!
java.lang.AssertionError
  at org.junit.Assert.fail(Assert.java:92)
  at org.junit.Assert.assertTrue(Assert.java:43)
  at org.junit.Assert.assertTrue(Assert.java:54)
  at 
org.apache.hadoop.hbase.util.TestThreads.testSleepWithoutInterrupt(TestThreads.java:57)
{code}
points to this line:
{code}
  assertTrue(sleeper.isInterrupted());
{code}

> Make HBase Thrift server more configurable and add a command-line UI test
> -
>
> Key: HBASE-4863
> URL: https://issues.apache.org/jira/browse/HBASE-4863
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: 
> 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, 
> D531.2.patch, D531.3.patch
>
>
> This started as an internal hotfix where we found out that the Thrift server 
> spawned 15000 threads. To bound the thread pool size I added a custom thread 
> pool server implementation called HBaseThreadPoolServer into HBase codebase, 
> and made the following parameters configurable from both command line and as 
> config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
> Under an increasing load, the server creates new threads for every connection 
> before the pool size reaches minWorkerThreads. After that, the server puts 
> new connections into the queue and only creates a new thread when the queue 
> is full. If an attempt to create a new thread fails, the server drops 
> connection. The default TThreadPoolServer would crash in that case, but it 
> never happened because the thread pool was unbounded, so the server would 
> hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
> the client side.
> Another part of this fix is refactoring and unit testing of the command-line 
> part of the Thrift server. The logic there is sufficiently complicated, and 
> the existing ThriftServer class does not test that part at all. The new 
> TestThriftServerCmdLine test starts the Thrift server on a random port with 
> various combinations of options and talks to it through the client API from 
> another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156956#comment-13156956
 ] 

Ted Yu commented on HBASE-4863:
---

In thrift2/ThriftServer.java:
{code}
  } else {
server = getTThreadPoolServer(protocolFactory, processor, 
transportFactory, inetSocketAddress);
{code}
where
{code}
TThreadPoolServer.Args serverArgs = new 
TThreadPoolServer.Args(serverTransport);
{code}
It would be nice to incorporate TBoundedThreadPoolServer into the above module. 
This can be done in a separate JIRA.

> Make HBase Thrift server more configurable and add a command-line UI test
> -
>
> Key: HBASE-4863
> URL: https://issues.apache.org/jira/browse/HBASE-4863
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: 
> 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, 
> D531.2.patch, D531.3.patch
>
>
> This started as an internal hotfix where we found out that the Thrift server 
> spawned 15000 threads. To bound the thread pool size I added a custom thread 
> pool server implementation called HBaseThreadPoolServer into HBase codebase, 
> and made the following parameters configurable from both command line and as 
> config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
> Under an increasing load, the server creates new threads for every connection 
> before the pool size reaches minWorkerThreads. After that, the server puts 
> new connections into the queue and only creates a new thread when the queue 
> is full. If an attempt to create a new thread fails, the server drops 
> connection. The default TThreadPoolServer would crash in that case, but it 
> never happened because the thread pool was unbounded, so the server would 
> hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
> the client side.
> Another part of this fix is refactoring and unit testing of the command-line 
> part of the Thrift server. The logic there is sufficiently complicated, and 
> the existing ThriftServer class does not test that part at all. The new 
> TestThriftServerCmdLine test starts the Thrift server on a random port with 
> various combinations of options and talks to it through the client API from 
> another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4855:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> SplitLogManager hangs on cluster restart due to batch.installed doubly counted
> --
>
> Key: HBASE-4855
> URL: https://issues.apache.org/jira/browse/HBASE-4855
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-4855.patch
>
>
> Start a master and RS
> RS goes down (kill -9)
> Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
> there it cannot be processed.
> Restart both master and bring up an RS.
> The master hangs in SplitLogManager.waitforTasks().
> I feel that batch.done is not getting incremented properly.  Not yet digged 
> in fully.
> This may be the reason for occasional failure of 
> TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156923#comment-13156923
 ] 

Hadoop QA commented on HBASE-4863:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12505038/0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 67 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.util.TestThreads

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/364//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/364//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/364//console

This message is automatically generated.

> Make HBase Thrift server more configurable and add a command-line UI test
> -
>
> Key: HBASE-4863
> URL: https://issues.apache.org/jira/browse/HBASE-4863
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: 
> 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, 
> D531.2.patch, D531.3.patch
>
>
> This started as an internal hotfix where we found out that the Thrift server 
> spawned 15000 threads. To bound the thread pool size I added a custom thread 
> pool server implementation called HBaseThreadPoolServer into HBase codebase, 
> and made the following parameters configurable from both command line and as 
> config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
> Under an increasing load, the server creates new threads for every connection 
> before the pool size reaches minWorkerThreads. After that, the server puts 
> new connections into the queue and only creates a new thread when the queue 
> is full. If an attempt to create a new thread fails, the server drops 
> connection. The default TThreadPoolServer would crash in that case, but it 
> never happened because the thread pool was unbounded, so the server would 
> hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
> the client side.
> Another part of this fix is refactoring and unit testing of the command-line 
> part of the Thrift server. The logic there is sufficiently complicated, and 
> the existing ThriftServer class does not test that part at all. The new 
> TestThriftServerCmdLine test starts the Thrift server on a random port with 
> various combinations of options and talks to it through the client API from 
> another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156907#comment-13156907
 ] 

Hudson commented on HBASE-4855:
---

Integrated in HBase-TRUNK #2481 (See 
[https://builds.apache.org/job/HBase-TRUNK/2481/])
HBASE-4855  SplitLogManager hangs on cluster restart due to batch.installed 
doubly counted

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java


> SplitLogManager hangs on cluster restart due to batch.installed doubly counted
> --
>
> Key: HBASE-4855
> URL: https://issues.apache.org/jira/browse/HBASE-4855
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-4855.patch
>
>
> Start a master and RS
> RS goes down (kill -9)
> Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
> there it cannot be processed.
> Restart both master and bring up an RS.
> The master hangs in SplitLogManager.waitforTasks().
> I feel that batch.done is not getting incremented properly.  Not yet digged 
> in fully.
> This may be the reason for occasional failure of 
> TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Mikhail Bautin (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4863:
--

Attachment: 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch

The same as D531.3.patch but generated using "git format-patch --no-prefix 
HEAD^..HEAD" so that it can be applied using the normal patch command.

> Make HBase Thrift server more configurable and add a command-line UI test
> -
>
> Key: HBASE-4863
> URL: https://issues.apache.org/jira/browse/HBASE-4863
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: 
> 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, 
> D531.2.patch, D531.3.patch
>
>
> This started as an internal hotfix where we found out that the Thrift server 
> spawned 15000 threads. To bound the thread pool size I added a custom thread 
> pool server implementation called HBaseThreadPoolServer into HBase codebase, 
> and made the following parameters configurable from both command line and as 
> config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
> Under an increasing load, the server creates new threads for every connection 
> before the pool size reaches minWorkerThreads. After that, the server puts 
> new connections into the queue and only creates a new thread when the queue 
> is full. If an attempt to create a new thread fails, the server drops 
> connection. The default TThreadPoolServer would crash in that case, but it 
> never happened because the thread pool was unbounded, so the server would 
> hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
> the client side.
> Another part of this fix is refactoring and unit testing of the command-line 
> part of the Thrift server. The logic there is sufficiently complicated, and 
> the existing ThriftServer class does not test that part at all. The new 
> TestThriftServerCmdLine test starts the Thrift server on a random port with 
> various combinations of options and talks to it through the client API from 
> another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Mikhail Bautin (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4863:
--

Status: Patch Available  (was: Open)

> Make HBase Thrift server more configurable and add a command-line UI test
> -
>
> Key: HBASE-4863
> URL: https://issues.apache.org/jira/browse/HBASE-4863
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: 
> 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, 
> D531.2.patch, D531.3.patch
>
>
> This started as an internal hotfix where we found out that the Thrift server 
> spawned 15000 threads. To bound the thread pool size I added a custom thread 
> pool server implementation called HBaseThreadPoolServer into HBase codebase, 
> and made the following parameters configurable from both command line and as 
> config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
> Under an increasing load, the server creates new threads for every connection 
> before the pool size reaches minWorkerThreads. After that, the server puts 
> new connections into the queue and only creates a new thread when the queue 
> is full. If an attempt to create a new thread fails, the server drops 
> connection. The default TThreadPoolServer would crash in that case, but it 
> never happened because the thread pool was unbounded, so the server would 
> hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
> the client side.
> Another part of this fix is refactoring and unit testing of the command-line 
> part of the Thrift server. The logic there is sufficiently complicated, and 
> the existing ThriftServer class does not test that part at all. The new 
> TestThriftServerCmdLine test starts the Thrift server on a random port with 
> various combinations of options and talks to it through the client API from 
> another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-4863:
---

Attachment: D531.3.patch

mbautin updated the revision "[jira] [HBASE-4863] Make HBase Thrift server more 
configurable and add a command-line UI test".
Reviewers: JIRA, Kannan, tedyu, stack

  Addressing Ted's comments. I will re-run unit tests and cluster tests, and 
post an update.

REVISION DETAIL
  https://reviews.facebook.net/D531

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/thrift/TBoundedThreadPoolServer.java
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
  src/main/java/org/apache/hadoop/hbase/util/Threads.java
  src/main/resources/hbase-default.xml
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
  src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServer.java
  src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java
  src/test/java/org/apache/hadoop/hbase/util/TestThreads.java


> Make HBase Thrift server more configurable and add a command-line UI test
> -
>
> Key: HBASE-4863
> URL: https://issues.apache.org/jira/browse/HBASE-4863
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: D531.1.patch, D531.2.patch, D531.3.patch
>
>
> This started as an internal hotfix where we found out that the Thrift server 
> spawned 15000 threads. To bound the thread pool size I added a custom thread 
> pool server implementation called HBaseThreadPoolServer into HBase codebase, 
> and made the following parameters configurable from both command line and as 
> config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
> Under an increasing load, the server creates new threads for every connection 
> before the pool size reaches minWorkerThreads. After that, the server puts 
> new connections into the queue and only creates a new thread when the queue 
> is full. If an attempt to create a new thread fails, the server drops 
> connection. The default TThreadPoolServer would crash in that case, but it 
> never happened because the thread pool was unbounded, so the server would 
> hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
> the client side.
> Another part of this fix is refactoring and unit testing of the command-line 
> part of the Thrift server. The logic there is sufficiently complicated, and 
> the existing ThriftServer class does not test that part at all. The new 
> TestThriftServerCmdLine test starts the Thrift server on a random port with 
> various combinations of options and talks to it through the client API from 
> another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156883#comment-13156883
 ] 

Phabricator commented on HBASE-4863:


tedyu has commented on the revision "[jira] [HBASE-4863] Make HBase Thrift 
server more configurable and add a command-line UI test".

  Should similar changes in thrift/ThriftServer.java be applied to 
thrift2/ThriftServer.java ?

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:111 
Should this become a parameter user can adjust ?
  src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:263 
Should ttx.getType() be logged ?
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java:179 Should 
read 'Exactly one '

REVISION DETAIL
  https://reviews.facebook.net/D531


> Make HBase Thrift server more configurable and add a command-line UI test
> -
>
> Key: HBASE-4863
> URL: https://issues.apache.org/jira/browse/HBASE-4863
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: D531.1.patch, D531.2.patch
>
>
> This started as an internal hotfix where we found out that the Thrift server 
> spawned 15000 threads. To bound the thread pool size I added a custom thread 
> pool server implementation called HBaseThreadPoolServer into HBase codebase, 
> and made the following parameters configurable from both command line and as 
> config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
> Under an increasing load, the server creates new threads for every connection 
> before the pool size reaches minWorkerThreads. After that, the server puts 
> new connections into the queue and only creates a new thread when the queue 
> is full. If an attempt to create a new thread fails, the server drops 
> connection. The default TThreadPoolServer would crash in that case, but it 
> never happened because the thread pool was unbounded, so the server would 
> hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
> the client side.
> Another part of this fix is refactoring and unit testing of the command-line 
> part of the Thrift server. The logic there is sufficiently complicated, and 
> the existing ThriftServer class does not test that part at all. The new 
> TestThriftServerCmdLine test starts the Thrift server on a random port with 
> various combinations of options and talks to it through the client API from 
> another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156873#comment-13156873
 ] 

Ted Yu commented on HBASE-4855:
---

Failed test was due to 'Too many open files'

Patch integrated to 0.92 and TRUNK.

Thanks for the patch Ramkrishna.

> SplitLogManager hangs on cluster restart due to batch.installed doubly counted
> --
>
> Key: HBASE-4855
> URL: https://issues.apache.org/jira/browse/HBASE-4855
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-4855.patch
>
>
> Start a master and RS
> RS goes down (kill -9)
> Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
> there it cannot be processed.
> Restart both master and bring up an RS.
> The master hangs in SplitLogManager.waitforTasks().
> I feel that batch.done is not getting incremented properly.  Not yet digged 
> in fully.
> This may be the reason for occasional failure of 
> TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4855:
--

Summary: SplitLogManager hangs on cluster restart due to batch.installed 
doubly counted  (was: SplitLogManager hangs on cluster restart. )

> SplitLogManager hangs on cluster restart due to batch.installed doubly counted
> --
>
> Key: HBASE-4855
> URL: https://issues.apache.org/jira/browse/HBASE-4855
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-4855.patch
>
>
> Start a master and RS
> RS goes down (kill -9)
> Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
> there it cannot be processed.
> Restart both master and bring up an RS.
> The master hangs in SplitLogManager.waitforTasks().
> I feel that batch.done is not getting incremented properly.  Not yet digged 
> in fully.
> This may be the reason for occasional failure of 
> TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-24 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156869#comment-13156869
 ] 

Hadoop QA commented on HBASE-4855:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12505020/HBASE-4855.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 66 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestAdmin

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/363//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/363//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/363//console

This message is automatically generated.

> SplitLogManager hangs on cluster restart. 
> --
>
> Key: HBASE-4855
> URL: https://issues.apache.org/jira/browse/HBASE-4855
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-4855.patch
>
>
> Start a master and RS
> RS goes down (kill -9)
> Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
> there it cannot be processed.
> Restart both master and bring up an RS.
> The master hangs in SplitLogManager.waitforTasks().
> I feel that batch.done is not getting incremented properly.  Not yet digged 
> in fully.
> This may be the reason for occasional failure of 
> TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156866#comment-13156866
 ] 

Ted Yu commented on HBASE-4862:
---

@Chunhui:
Can you attach master and region server log snippets which would show us what 
happened ?

Thanks

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HBASE-4856) Upgrade zookeeper to 3.4.0 release

2011-11-24 Thread Ted Yu (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-4856.
---

Resolution: Fixed

> Upgrade zookeeper to 3.4.0 release
> --
>
> Key: HBASE-4856
> URL: https://issues.apache.org/jira/browse/HBASE-4856
> Project: HBase
>  Issue Type: Task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 4856.txt
>
>
> Zookeeper 3.4.0 has been released.
> We should upgade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-24 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156858#comment-13156858
 ] 

Todd Lipcon commented on HBASE-4862:


wait, wait -- _why_ is this happening concurrently? A region should never be 
opened until the split process is done for that region. If this is happening we 
have a much larger issue, which we shouldn't be working around with tmp file 
names, etc.

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4856) Upgrade zookeeper to 3.4.0 release

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156845#comment-13156845
 ] 

Hudson commented on HBASE-4856:
---

Integrated in HBase-TRUNK #2479 (See 
[https://builds.apache.org/job/HBase-TRUNK/2479/])
HBASE-4856  Upgrade zookeeper to 3.4.0 release

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/pom.xml


> Upgrade zookeeper to 3.4.0 release
> --
>
> Key: HBASE-4856
> URL: https://issues.apache.org/jira/browse/HBASE-4856
> Project: HBase
>  Issue Type: Task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 4856.txt
>
>
> Zookeeper 3.4.0 has been released.
> We should upgade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4864) TestMasterObserver#testRegionTransitionOperations occasionally fails

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156844#comment-13156844
 ] 

Hudson commented on HBASE-4864:
---

Integrated in HBase-TRUNK #2479 (See 
[https://builds.apache.org/job/HBase-TRUNK/2479/])
HBASE-4864  TestMasterObserver#testRegionTransitionOperations occasionally
   fails (Gao Jinchao)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java


> TestMasterObserver#testRegionTransitionOperations occasionally fails
> 
>
> Key: HBASE-4864
> URL: https://issues.apache.org/jira/browse/HBASE-4864
> Project: HBase
>  Issue Type: Test
>  Components: test
>Reporter: gaojinchao
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4864_Branch92.patch
>
>
> looks this logs:
> https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
> It seems that we should wait region is added to online region set.
> I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156830#comment-13156830
 ] 

Ted Yu commented on HBASE-4855:
---

+1 on patch.

> SplitLogManager hangs on cluster restart. 
> --
>
> Key: HBASE-4855
> URL: https://issues.apache.org/jira/browse/HBASE-4855
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-4855.patch
>
>
> Start a master and RS
> RS goes down (kill -9)
> Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
> there it cannot be processed.
> Restart both master and bring up an RS.
> The master hangs in SplitLogManager.waitforTasks().
> I feel that batch.done is not getting incremented properly.  Not yet digged 
> in fully.
> This may be the reason for occasional failure of 
> TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Ted Yu (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156821#comment-13156821
 ] 

Ted Yu edited comment on HBASE-4863 at 11/24/11 5:14 PM:
-

I got compilation error:
{code}
testRunThriftServer[0](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine)  
Time elapsed: 2.047 sec  <<< ERROR!
java.lang.Error: Unresolved compilation problem:
  Cannot make a static reference to the non-static method 
getColumnDescriptors() from the type TestThriftServer

  at 
org.apache.hadoop.hbase.thrift.TestThriftServer.createDropTable(TestThriftServer.java:111)
{code}

Since HBaseThreadPoolServer extends TServer, I think a better name for the 
class would be TBoundedThreadPoolServer (TThreadPoolServer is in thrift).

  was (Author: yuzhih...@gmail.com):
I got compilation error:
{code}
testRunThriftServer[0](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine)  
Time elapsed: 2.047 sec  <<< ERROR!
java.lang.Error: Unresolved compilation problem:
  Cannot make a static reference to the non-static method 
getColumnDescriptors() from the type TestThriftServer

  at 
org.apache.hadoop.hbase.thrift.TestThriftServer.createDropTable(TestThriftServer.java:111)
{code}

Since HBaseThreadPoolServer extends TServer, I think a better name for the 
class would be TThreadPoolServer.
  
> Make HBase Thrift server more configurable and add a command-line UI test
> -
>
> Key: HBASE-4863
> URL: https://issues.apache.org/jira/browse/HBASE-4863
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: D531.1.patch, D531.2.patch
>
>
> This started as an internal hotfix where we found out that the Thrift server 
> spawned 15000 threads. To bound the thread pool size I added a custom thread 
> pool server implementation called HBaseThreadPoolServer into HBase codebase, 
> and made the following parameters configurable from both command line and as 
> config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
> Under an increasing load, the server creates new threads for every connection 
> before the pool size reaches minWorkerThreads. After that, the server puts 
> new connections into the queue and only creates a new thread when the queue 
> is full. If an attempt to create a new thread fails, the server drops 
> connection. The default TThreadPoolServer would crash in that case, but it 
> never happened because the thread pool was unbounded, so the server would 
> hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
> the client side.
> Another part of this fix is refactoring and unit testing of the command-line 
> part of the Thrift server. The logic there is sufficiently complicated, and 
> the existing ThriftServer class does not test that part at all. The new 
> TestThriftServerCmdLine test starts the Thrift server on a random port with 
> various combinations of options and talks to it through the client API from 
> another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Phabricator (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156828#comment-13156828
 ] 

Phabricator commented on HBASE-4863:


tedyu has commented on the revision "[jira] [HBASE-4863] Make HBase Thrift 
server more configurable and add a command-line UI test".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:64 
Please add javadoc for the keys.
  These keys should be placed into hbase-default.xml
  src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:80 Is 
TIME_TO_WAIT_AFTER_SHUTDOWN_MS a better name for this constant ?

REVISION DETAIL
  https://reviews.facebook.net/D531


> Make HBase Thrift server more configurable and add a command-line UI test
> -
>
> Key: HBASE-4863
> URL: https://issues.apache.org/jira/browse/HBASE-4863
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: D531.1.patch, D531.2.patch
>
>
> This started as an internal hotfix where we found out that the Thrift server 
> spawned 15000 threads. To bound the thread pool size I added a custom thread 
> pool server implementation called HBaseThreadPoolServer into HBase codebase, 
> and made the following parameters configurable from both command line and as 
> config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
> Under an increasing load, the server creates new threads for every connection 
> before the pool size reaches minWorkerThreads. After that, the server puts 
> new connections into the queue and only creates a new thread when the queue 
> is full. If an attempt to create a new thread fails, the server drops 
> connection. The default TThreadPoolServer would crash in that case, but it 
> never happened because the thread pool was unbounded, so the server would 
> hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
> the client side.
> Another part of this fix is refactoring and unit testing of the command-line 
> part of the Thrift server. The logic there is sufficiently complicated, and 
> the existing ThriftServer class does not test that part at all. The new 
> TestThriftServerCmdLine test starts the Thrift server on a random port with 
> various combinations of options and talks to it through the client API from 
> another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-24 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4855:
--

Status: Patch Available  (was: Open)

> SplitLogManager hangs on cluster restart. 
> --
>
> Key: HBASE-4855
> URL: https://issues.apache.org/jira/browse/HBASE-4855
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-4855.patch
>
>
> Start a master and RS
> RS goes down (kill -9)
> Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
> there it cannot be processed.
> Restart both master and bring up an RS.
> The master hangs in SplitLogManager.waitforTasks().
> I feel that batch.done is not getting incremented properly.  Not yet digged 
> in fully.
> This may be the reason for occasional failure of 
> TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-24 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4855:
--

Attachment: HBASE-4855.patch

TestDistributedLogSplitting is passing .  Other test cases results will get in 
the morning.

> SplitLogManager hangs on cluster restart. 
> --
>
> Key: HBASE-4855
> URL: https://issues.apache.org/jira/browse/HBASE-4855
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-4855.patch
>
>
> Start a master and RS
> RS goes down (kill -9)
> Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
> there it cannot be processed.
> Restart both master and bring up an RS.
> The master hangs in SplitLogManager.waitforTasks().
> I feel that batch.done is not getting incremented properly.  Not yet digged 
> in fully.
> This may be the reason for occasional failure of 
> TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156822#comment-13156822
 ] 

Ted Yu commented on HBASE-4855:
---

The above analysis makes sense.

Nice catch Ramkrishna.

> SplitLogManager hangs on cluster restart. 
> --
>
> Key: HBASE-4855
> URL: https://issues.apache.org/jira/browse/HBASE-4855
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> Start a master and RS
> RS goes down (kill -9)
> Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
> there it cannot be processed.
> Restart both master and bring up an RS.
> The master hangs in SplitLogManager.waitforTasks().
> I feel that batch.done is not getting incremented properly.  Not yet digged 
> in fully.
> This may be the reason for occasional failure of 
> TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156821#comment-13156821
 ] 

Ted Yu commented on HBASE-4863:
---

I got compilation error:
{code}
testRunThriftServer[0](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine)  
Time elapsed: 2.047 sec  <<< ERROR!
java.lang.Error: Unresolved compilation problem:
  Cannot make a static reference to the non-static method 
getColumnDescriptors() from the type TestThriftServer

  at 
org.apache.hadoop.hbase.thrift.TestThriftServer.createDropTable(TestThriftServer.java:111)
{code}

Since HBaseThreadPoolServer extends TServer, I think a better name for the 
class would be TThreadPoolServer.

> Make HBase Thrift server more configurable and add a command-line UI test
> -
>
> Key: HBASE-4863
> URL: https://issues.apache.org/jira/browse/HBASE-4863
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: D531.1.patch, D531.2.patch
>
>
> This started as an internal hotfix where we found out that the Thrift server 
> spawned 15000 threads. To bound the thread pool size I added a custom thread 
> pool server implementation called HBaseThreadPoolServer into HBase codebase, 
> and made the following parameters configurable from both command line and as 
> config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
> Under an increasing load, the server creates new threads for every connection 
> before the pool size reaches minWorkerThreads. After that, the server puts 
> new connections into the queue and only creates a new thread when the queue 
> is full. If an attempt to create a new thread fails, the server drops 
> connection. The default TThreadPoolServer would crash in that case, but it 
> never happened because the thread pool was unbounded, so the server would 
> hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
> the client side.
> Another part of this fix is refactoring and unit testing of the command-line 
> part of the Thrift server. The logic there is sufficiently complicated, and 
> the existing ThriftServer class does not test that part at all. The new 
> TestThriftServerCmdLine test starts the Thrift server on a random port with 
> various combinations of options and talks to it through the client API from 
> another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-24 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156820#comment-13156820
 ] 

ramkrishna.s.vasudevan commented on HBASE-4855:
---

When the master restarts and sees splitlog nodes which are not processed the 
SplitLogManager does handleUnassignedTasks
{code}
Task task = findOrCreateOrphanTask(path);
{code}
As part of which 
{code}
task = tasks.putIfAbsent(path, orphanTask);
{code}
Ths task is added.  Later in splitLogDistributed() we try to installTask().

Here we create the task if absent
{code}
Task oldtask = createTaskIfAbsent(path, batch);
{code}
Inside createTaskIfAbsent()
{code}
oldtask = tasks.putIfAbsent(path, new Task(batch));
if (oldtask != null && oldtask.isOrphan()) {
LOG.info("Previously orphan task " + path +
" is now being waited upon");
oldtask.setBatch(batch);
return (null);
}
{code}
the putIfAbsent returns the already added task so oldtask is not null.
Already while doing new Task(batch) 
{code}
   Task(TaskBatch tb) {
  incarnation = 0;
  last_version = -1;
  deleted = false;
  setBatch(tb);
  setUnassigned();
}

public void setBatch(TaskBatch batch) {
  if (batch != null && this.batch != null) {
LOG.fatal("logic error - batch being overwritten");
  }
  this.batch = batch;
  if (batch != null) {
batch.installed++;
  }
}
{code}
the batch.installed++ happens.  Since the oldtask is not null once again we call
oldtask.setBatch(batch) making the batch.installed to increment once again.

This is why batch.done is not able to reach this batch.installed and hence the 
while loop keeps looping.
{code}
while ((batch.done + batch.error) != batch.installed) {
{code}

Pls correct me if my analysis is wrong.  I am uploading a patch which solved 
the problem.  Kindly validate the fix.


> SplitLogManager hangs on cluster restart. 
> --
>
> Key: HBASE-4855
> URL: https://issues.apache.org/jira/browse/HBASE-4855
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> Start a master and RS
> RS goes down (kill -9)
> Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
> there it cannot be processed.
> Restart both master and bring up an RS.
> The master hangs in SplitLogManager.waitforTasks().
> I feel that batch.done is not getting incremented properly.  Not yet digged 
> in fully.
> This may be the reason for occasional failure of 
> TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-24 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4855:
--

Fix Version/s: (was: 0.92.0)

> SplitLogManager hangs on cluster restart. 
> --
>
> Key: HBASE-4855
> URL: https://issues.apache.org/jira/browse/HBASE-4855
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> Start a master and RS
> RS goes down (kill -9)
> Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
> there it cannot be processed.
> Restart both master and bring up an RS.
> The master hangs in SplitLogManager.waitforTasks().
> I feel that batch.done is not getting incremented properly.  Not yet digged 
> in fully.
> This may be the reason for occasional failure of 
> TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-24 Thread ramkrishna.s.vasudevan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4855:
--

Affects Version/s: 0.92.0
Fix Version/s: 0.92.0

> SplitLogManager hangs on cluster restart. 
> --
>
> Key: HBASE-4855
> URL: https://issues.apache.org/jira/browse/HBASE-4855
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.0
>
>
> Start a master and RS
> RS goes down (kill -9)
> Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
> there it cannot be processed.
> Restart both master and bring up an RS.
> The master hangs in SplitLogManager.waitforTasks().
> I feel that batch.done is not getting incremented properly.  Not yet digged 
> in fully.
> This may be the reason for occasional failure of 
> TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4866) Fix possible NPE in AssignmentManager#regionOnline

2011-11-24 Thread Jonathan Hsieh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156779#comment-13156779
 ] 

Jonathan Hsieh commented on HBASE-4866:
---


Looks like it corresponds to this line which is AssignmentManager:724 on the 
0.90 branch

{code}
  HServerInfo hsiWithoutLoad = new HServerInfo(
serverInfo.getServerAddress(), serverInfo.getStartCode(),
serverInfo.getInfoPort(), serverInfo.getHostname());
{code}   

> Fix possible NPE in AssignmentManager#regionOnline
> --
>
> Key: HBASE-4866
> URL: https://issues.apache.org/jira/browse/HBASE-4866
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.4
>Reporter: Jonathan Hsieh
>
> NPE encountered in users's HMaster logs:
> {code}
> 11/11/22 23:45:37 FATAL master.HMaster: Unhandled exception. Starting 
> shutdown.
> java.lang.NullPointerException
>at 
> org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731)
>at 
> org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:215)
>at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:422)
>at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:295)
> {code}
> From user list: 
> http://mail-archives.apache.org/mod_mbox/hbase-user/20.mbox/%3C4ECC9AFC.6030307%40qualtrics.com%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4861) Fix some misspells and extraneous characters in logs; set some to TRACE

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156776#comment-13156776
 ] 

Hudson commented on HBASE-4861:
---

Integrated in HBase-TRUNK #2478 (See 
[https://builds.apache.org/job/HBase-TRUNK/2478/])
HBASE-4861 Fix some misspells and extraneous characters in logs; set some 
to TRACE

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/SplitRegionHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java


> Fix some misspells and extraneous characters in logs; set some to TRACE
> ---
>
> Key: HBASE-4861
> URL: https://issues.apache.org/jira/browse/HBASE-4861
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
> Fix For: 0.92.0
>
> Attachments: 4861.txt
>
>
> Some small clean up in logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4866) Fix possible NPE in AssignmentManager#regionOnline

2011-11-24 Thread Jonathan Hsieh (Created) (JIRA)
Fix possible NPE in AssignmentManager#regionOnline
--

 Key: HBASE-4866
 URL: https://issues.apache.org/jira/browse/HBASE-4866
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.4
Reporter: Jonathan Hsieh


NPE encountered in users's HMaster logs:

{code}
11/11/22 23:45:37 FATAL master.HMaster: Unhandled exception. Starting shutdown.
java.lang.NullPointerException
   at 
org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731)
   at 
org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:215)
   at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:422)
   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:295)
{code}

>From user list: 
>http://mail-archives.apache.org/mod_mbox/hbase-user/20.mbox/%3C4ECC9AFC.6030307%40qualtrics.com%3E


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4865) HBaseAdmin addColumn, modifyColumn, deleteColumn are documented as asynchronous but are actually synchronous.

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156771#comment-13156771
 ] 

Ted Yu commented on HBASE-4865:
---

w.r.t. HBaseAdmin#createTable[Async] methods, see HBASE-3904 and HBASE-3229
We don't need to change their implementation now.

> HBaseAdmin addColumn, modifyColumn, deleteColumn are documented as 
> asynchronous but are actually synchronous.
> -
>
> Key: HBASE-4865
> URL: https://issues.apache.org/jira/browse/HBASE-4865
> Project: HBase
>  Issue Type: Bug
>  Components: client, master
>Affects Versions: 0.94.0
> Environment: all
>Reporter: nkeywal
>Priority: Minor
>
> The javadoc states is asynchronous, but we can see in the implementation on 
> HMaster that the implementation does not use executorService but calls 
> directly process(). This is not true for all methods: enableTable, 
> modifyTable, disableTable are truly asynchronous.
> The other impact is that the listeners are not called, as this is done by the 
> executorService.
> I don't known if we have to change the documentation or the implementation. 
> For consistency; I would change the implementation, but it may breaks 
> existing code.
> Two other comments:
> 1) There is no real naming pattern here, while it would be useful:
> HBaseAdmin#createTable is synchrounous and calls the asynchronous 
> HMaster#createTable 
> HBaseAdmin#createTableAsync is asynchrounous and calls the asynchronous 
> HMaster#createTable 
> HBaseAdmin#modifyTable is asynchrounous and calls the asynchronous 
> HMaster#modifyTable 
> HBaseAdmin#modifyColumn is documented as asynchrounous and calls the 
> synchronous HMaster#modifyColumn
> 2) the coprocessor "post" semantic is not consistent across the services.
> - when the service is synchronous, post is called after the services 
> execution (ex: addColumn with the current implementation).
> - when the service is asynchronous, post is called after the executorService 
> has registered the service to execute, but the service itself is not executed 
> yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4856) Upgrade zookeeper to 3.4.0 release

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156770#comment-13156770
 ] 

Ted Yu commented on HBASE-4856:
---

Integrated to 0.92 and TRUNK after verifying that 3.4.0 artifacts could be 
pulled.

> Upgrade zookeeper to 3.4.0 release
> --
>
> Key: HBASE-4856
> URL: https://issues.apache.org/jira/browse/HBASE-4856
> Project: HBase
>  Issue Type: Task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 4856.txt
>
>
> Zookeeper 3.4.0 has been released.
> We should upgade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4862) Split hlog and open region concurrently happend may cause data loss

2011-11-24 Thread Ted Yu (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-4862:
-

Assignee: chunhui shen

> Split hlog and open region concurrently happend may cause data loss
> ---
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Split hlog and open region concurrently happend may cause data loss

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Fix Version/s: 0.90.5
   0.94.0
   0.92.0

> Split hlog and open region concurrently happend may cause data loss
> ---
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4862:
--

Summary: Splitting hlog and opening region concurrently may cause data loss 
 (was: Split hlog and open region concurrently happend may cause data loss)

> Splitting hlog and opening region concurrently may cause data loss
> --
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.92.0, 0.94.0, 0.90.5
>
> Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4862) Split hlog and open region concurrently happend may cause data loss

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156765#comment-13156765
 ] 

Ted Yu commented on HBASE-4862:
---

Nice work.
The patch doesn't apply to 0.90 branch:
{code}
Hunk #4 succeeded at 783 (offset -332 lines).
1 out of 4 hunks FAILED -- saving rejects to file 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java.rej
...
patch unexpectedly ends in middle of line
2 out of 2 hunks ignored -- saving rejects to file 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java.rej
{code}
Please rebase your patch for 0.90

A separate patch for TRUNK would be helpful for HadoopQA to run test suite.

Comments about the changes:
getTmpRecoveredEditsFileName() is only used once and there is no javadoc for 
it. Maybe we don't need to create the method, just append ".tmp" directly to 
the filename.
{code}
+// Convert file name ends with .tmp, so ensure region's 
replayRecoveredEdits
{code}
The beginning of the above should read 'Append filename with '.tmp' to ensure'

> Split hlog and open region concurrently happend may cause data loss
> ---
>
> Key: HBASE-4862
> URL: https://issues.apache.org/jira/browse/HBASE-4862
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: chunhui shen
> Attachments: 4862.patch
>
>
> Case Description:
> 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 
> and is appending log entry
> 2.Regionserver is opening region A now, and in the process 
> replayRecoveredEditsIfAny() ,it will delete the file region 
> A/recoverd.edits/123456 
> 3.Split hlog thread catches the io exception, and stop parse this log file 
> and if skipError = true , add it to the corrupt logsHowever, data in 
> other regions in this log file will loss 
> 4.Or if skipError = false, it will check filesystem.Of course, the file 
> system is ok , and it only prints a error log, continue assigning regions. 
> Therefore, data in other log files will also loss!!
> The case may happen in the following:
> 1.Move region from server A to server B
> 2.kill server A and Server B
> 3.restart server A and Server B
> We could prevent this exception throuth forbiding deleting  recover.edits 
> file 
> which is appending by split hlog thread

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4864) TestMasterObserver#testRegionTransitionOperations occasionally fails

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156753#comment-13156753
 ] 

Ted Yu commented on HBASE-4864:
---

Integrated to 0.92 and TRUNK.

Thanks for the patch Jinchao.

> TestMasterObserver#testRegionTransitionOperations occasionally fails
> 
>
> Key: HBASE-4864
> URL: https://issues.apache.org/jira/browse/HBASE-4864
> Project: HBase
>  Issue Type: Test
>  Components: test
>Reporter: gaojinchao
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4864_Branch92.patch
>
>
> looks this logs:
> https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
> It seems that we should wait region is added to online region set.
> I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4864) TestMasterObserver#testRegionTransitionOperations occasionally fails

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4864:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> TestMasterObserver#testRegionTransitionOperations occasionally fails
> 
>
> Key: HBASE-4864
> URL: https://issues.apache.org/jira/browse/HBASE-4864
> Project: HBase
>  Issue Type: Test
>  Components: test
>Reporter: gaojinchao
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4864_Branch92.patch
>
>
> looks this logs:
> https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
> It seems that we should wait region is added to online region set.
> I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4864) testRegionTransitionOperations occasional failures

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4864:
--

  Issue Type: Test  (was: Bug)
Hadoop Flags: Reviewed

> testRegionTransitionOperations occasional failures
> --
>
> Key: HBASE-4864
> URL: https://issues.apache.org/jira/browse/HBASE-4864
> Project: HBase
>  Issue Type: Test
>  Components: test
>Reporter: gaojinchao
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4864_Branch92.patch
>
>
> looks this logs:
> https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
> It seems that we should wait region is added to online region set.
> I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4864) TestMasterObserver#testRegionTransitionOperations occasionally fails

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4864:
--

Summary: TestMasterObserver#testRegionTransitionOperations occasionally 
fails  (was: testRegionTransitionOperations occasional failures)

> TestMasterObserver#testRegionTransitionOperations occasionally fails
> 
>
> Key: HBASE-4864
> URL: https://issues.apache.org/jira/browse/HBASE-4864
> Project: HBase
>  Issue Type: Test
>  Components: test
>Reporter: gaojinchao
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4864_Branch92.patch
>
>
> looks this logs:
> https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
> It seems that we should wait region is added to online region set.
> I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-4864) testRegionTransitionOperations occasional failures

2011-11-24 Thread Ted Yu (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-4864:
-

Assignee: gaojinchao

> testRegionTransitionOperations occasional failures
> --
>
> Key: HBASE-4864
> URL: https://issues.apache.org/jira/browse/HBASE-4864
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: gaojinchao
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4864_Branch92.patch
>
>
> looks this logs:
> https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
> It seems that we should wait region is added to online region set.
> I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.

2011-11-24 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156732#comment-13156732
 ] 

ramkrishna.s.vasudevan commented on HBASE-4855:
---

The batch.installed was getting incremented twice. I will upload the patch 
shortly for review. Test cases result will let you know tomorrow morning as it 
will take time. 

> SplitLogManager hangs on cluster restart. 
> --
>
> Key: HBASE-4855
> URL: https://issues.apache.org/jira/browse/HBASE-4855
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>
> Start a master and RS
> RS goes down (kill -9)
> Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is 
> there it cannot be processed.
> Restart both master and bring up an RS.
> The master hangs in SplitLogManager.waitforTasks().
> I feel that batch.done is not getting incremented properly.  Not yet digged 
> in fully.
> This may be the reason for occasional failure of 
> TestDistributedLogSplitting.testWorkerAbort(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4865) HBaseAdmin addColumn, modifyColumn, deleteColumn are documented as asynchronous but are actually synchronous.

2011-11-24 Thread nkeywal (Created) (JIRA)
HBaseAdmin addColumn, modifyColumn, deleteColumn are documented as asynchronous 
but are actually synchronous.
-

 Key: HBASE-4865
 URL: https://issues.apache.org/jira/browse/HBASE-4865
 Project: HBase
  Issue Type: Bug
  Components: client, master
Affects Versions: 0.94.0
 Environment: all
Reporter: nkeywal
Priority: Minor


The javadoc states is asynchronous, but we can see in the implementation on 
HMaster that the implementation does not use executorService but calls directly 
process(). This is not true for all methods: enableTable, modifyTable, 
disableTable are truly asynchronous.

The other impact is that the listeners are not called, as this is done by the 
executorService.


I don't known if we have to change the documentation or the implementation. For 
consistency; I would change the implementation, but it may breaks existing code.


Two other comments:
1) There is no real naming pattern here, while it would be useful:
HBaseAdmin#createTable is synchrounous and calls the asynchronous 
HMaster#createTable 
HBaseAdmin#createTableAsync is asynchrounous and calls the asynchronous 
HMaster#createTable 
HBaseAdmin#modifyTable is asynchrounous and calls the asynchronous 
HMaster#modifyTable 
HBaseAdmin#modifyColumn is documented as asynchrounous and calls the 
synchronous HMaster#modifyColumn

2) the coprocessor "post" semantic is not consistent across the services.
- when the service is synchronous, post is called after the services execution 
(ex: addColumn with the current implementation).
- when the service is asynchronous, post is called after the executorService 
has registered the service to execute, but the service itself is not executed 
yet.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4861) Fix some misspells and extraneous characters in logs; set some to TRACE

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156721#comment-13156721
 ] 

Hudson commented on HBASE-4861:
---

Integrated in HBase-TRUNK-security #8 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/8/])
HBASE-4861 Fix some misspells and extraneous characters in logs; set some 
to TRACE

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/SplitRegionHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java


> Fix some misspells and extraneous characters in logs; set some to TRACE
> ---
>
> Key: HBASE-4861
> URL: https://issues.apache.org/jira/browse/HBASE-4861
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
> Fix For: 0.92.0
>
> Attachments: 4861.txt
>
>
> Some small clean up in logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4864) testRegionTransitionOperations occasional failures

2011-11-24 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156710#comment-13156710
 ] 

Hadoop QA commented on HBASE-4864:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12505006/HBASE-4864_Branch92.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated -162 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 66 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestInstantSchemaChange
  org.apache.hadoop.hbase.client.TestAdmin

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/362//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/362//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/362//console

This message is automatically generated.

> testRegionTransitionOperations occasional failures
> --
>
> Key: HBASE-4864
> URL: https://issues.apache.org/jira/browse/HBASE-4864
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4864_Branch92.patch
>
>
> looks this logs:
> https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
> It seems that we should wait region is added to online region set.
> I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4789) On split, parent region is sticking around in oldest sequenceid to region map though not online; we don't cleanup WALs.

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156654#comment-13156654
 ] 

Hudson commented on HBASE-4789:
---

Integrated in HBase-TRUNK #2477 (See 
[https://builds.apache.org/job/HBase-TRUNK/2477/])
HBASE-4853 HBASE-4789 does overzealous pruning of seqids
HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT TEMPORARILY TO 
GET TED COMMENT IN
HBASE-4853 HBASE-4789 does overzealous pruning of seqids

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java


> On split, parent region is sticking around in oldest sequenceid to region map 
> though not online; we don't cleanup WALs.
> ---
>
> Key: HBASE-4789
> URL: https://issues.apache.org/jira/browse/HBASE-4789
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4789-v2.txt, 4789-v3.txt, 4789-v4.txt, 4789.txt
>
>
> Here is log for a particular region:
> {code}
> 2011-11-15 05:46:31,382 INFO 
> org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
> master to process the split for 8bbd7388262dc8cb1ce2cf4f04a7281d
> 2011-11-15 05:46:31,483 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:7003-0x1337b0b92cd000a-0x1337b0b92cd000a Attempting to 
> transition node 8bbd7388262dc8cb1ce2cf4f04a7281d from RS_ZK_REGION_SPLIT to 
> RS_ZK_REG
> ION_SPLIT
> 2011-11-15 05:46:31,484 INFO 
> org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, META 
> updated, and report to master. 
> Parent=TestTable,0862220095,1321335865649.8bbd7388262dc8cb1ce2cf4f04a7281d., 
> new regions: TestTab
> le,0862220095,1321335989689.f00c683df3182d8ef33e315f77ca539c., 
> TestTable,0892568091,1321335989689.a56ca1eff5b4401432fcba04b4e851f8.. Split 
> took 1sec
> 2011-11-15 05:46:37,705 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
> Compacting 
> hdfs://sv4r11s38:7000/hbase/TestTable/a56ca1eff5b4401432fcba04b4e851f8/info/9ce16d8fa94e4938964c04775a6fa1a7.8bbd7388262dc8cb1ce2cf4f04a7281d-
> hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9ce16d8fa94e4938964c04775a6fa1a7-top,
>  keycount=717559, bloomtype=NONE, size=711.1m
> 2011-11-15 05:46:37,705 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
> Compacting 
> hdfs://sv4r11s38:7000/hbase/TestTable/a56ca1eff5b4401432fcba04b4e851f8/info/9213f4d7ee9b4fda857a97603a001f9e.8bbd7388262dc8cb1ce2cf4f04a7281d-
> hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9213f4d7ee9b4fda857a97603a001f9e-top,
>  keycount=416691, bloomtype=NONE, size=412.9m
> 2011-11-15 05:46:53,090 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
> Compacting 
> hdfs://sv4r11s38:7000/hbase/TestTable/f00c683df3182d8ef33e315f77ca539c/info/9ce16d8fa94e4938964c04775a6fa1a7.8bbd7388262dc8cb1ce2cf4f04a7281d-
> hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9ce16d8fa94e4938964c04775a6fa1a7-bottom,
>  keycount=717559, bloomtype=NONE, size=711.1m
> 2011-11-15 05:46:53,090 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
> Compacting 
> hdfs://sv4r11s38:7000/hbase/TestTable/f00c683df3182d8ef33e315f77ca539c/info/9213f4d7ee9b4fda857a97603a001f9e.8bbd7388262dc8cb1ce2cf4f04a7281d-
> hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9213f4d7ee9b4fda857a97603a001f9e-bottom,
>  keycount=416691, bloomtype=NONE, size=412.9m
> 2011-11-15 05:48:00,690 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Found 3 hlogs to remove out of total 12; oldest outstanding sequenceid is 
> 5699 from region 8bbd7388262dc8cb1ce2cf4f04a7281d
> 2011-11-15 05:57:54,083 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Too many hlogs: logs=33, maxlogs=32; forcing flush of 1 regions(s): 
> 8bbd7388262dc8cb1ce2cf4f04a7281d
> 2011-11-15 05:57:54,083 WARN org.apache.hadoop.hbase.regionserver.LogRoller: 
> Failed to schedule flush of 8bbd7388262dc8cb1ce2cf4f04a7281dr=null, 
> requester=null
> 201

[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156653#comment-13156653
 ] 

Hudson commented on HBASE-4853:
---

Integrated in HBase-TRUNK #2477 (See 
[https://builds.apache.org/job/HBase-TRUNK/2477/])
HBASE-4853 HBASE-4789 does overzealous pruning of seqids
HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT TEMPORARILY TO 
GET TED COMMENT IN
HBASE-4853 HBASE-4789 does overzealous pruning of seqids

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java


> HBASE-4789 does overzealous pruning of seqids
> -
>
> Key: HBASE-4853
> URL: https://issues.apache.org/jira/browse/HBASE-4853
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v10.txt, 
> 4853-v4.txt, 4853-v5.txt, 4853-v6.txt, 4853-v7.txt, 4853-v8.txt, 4853-v9.txt, 
> 4853-v9.txt, 4853.txt
>
>
> Working w/ J-D on failing replication test turned up hole in seqids made by 
> the patch over in hbase-4789.  With this patch in place we see lots of 
> instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
> old hlogs'
> At a minimum, these lines need removing:
> {code}
> diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
> b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
> index 623edbe..a0bbe01 100644
> --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
> +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
> @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
>// Cleaning up of lastSeqWritten is in the finally clause because we
>// don't want to confuse getOldestOutstandingSeqNum()
>this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
> -  Long l = this.lastSeqWritten.remove(encodedRegionName);
> -  if (l != null) {
> -LOG.warn("Why is there a raw encodedRegionName in lastSeqWritten? 
> name=" +
> -  Bytes.toString(encodedRegionName) + ", seqid=" + l);
> -   }
>this.cacheFlushLock.unlock();
>  }
>}
> {code}
> ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4864) testRegionTransitionOperations occasional failures

2011-11-24 Thread Ted Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4864:
--

Status: Patch Available  (was: Open)

> testRegionTransitionOperations occasional failures
> --
>
> Key: HBASE-4864
> URL: https://issues.apache.org/jira/browse/HBASE-4864
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4864_Branch92.patch
>
>
> looks this logs:
> https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
> It seems that we should wait region is added to online region set.
> I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4864) testRegionTransitionOperations occasional failures

2011-11-24 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156646#comment-13156646
 ] 

Ted Yu commented on HBASE-4864:
---

+1 on patch. 

> testRegionTransitionOperations occasional failures
> --
>
> Key: HBASE-4864
> URL: https://issues.apache.org/jira/browse/HBASE-4864
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4864_Branch92.patch
>
>
> looks this logs:
> https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
> It seems that we should wait region is added to online region set.
> I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4864) testRegionTransitionOperations occasional failures

2011-11-24 Thread gaojinchao (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-4864:
--

Attachment: HBASE-4864_Branch92.patch

> testRegionTransitionOperations occasional failures
> --
>
> Key: HBASE-4864
> URL: https://issues.apache.org/jira/browse/HBASE-4864
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: HBASE-4864_Branch92.patch
>
>
> looks this logs:
> https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/
> It seems that we should wait region is added to online region set.
> I made a patch, Please review.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4864) testRegionTransitionOperations occasional failures

2011-11-24 Thread gaojinchao (Created) (JIRA)
testRegionTransitionOperations occasional failures
--

 Key: HBASE-4864
 URL: https://issues.apache.org/jira/browse/HBASE-4864
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: gaojinchao
Priority: Minor
 Fix For: 0.92.0, 0.94.0


looks this logs:
https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/

It seems that we should wait region is added to online region set.

I made a patch, Please review.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4856) Upgrade zookeeper to 3.4.0 release

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156634#comment-13156634
 ] 

Hudson commented on HBASE-4856:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4856  Upgrade zookeeper to 3.4.0 release - revert, Apache maven 
repository not ready
HBASE-4856  Upgrade zookeeper to 3.4.0 release

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/pom.xml

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/pom.xml


> Upgrade zookeeper to 3.4.0 release
> --
>
> Key: HBASE-4856
> URL: https://issues.apache.org/jira/browse/HBASE-4856
> Project: HBase
>  Issue Type: Task
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: 4856.txt
>
>
> Zookeeper 3.4.0 has been released.
> We should upgade.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4789) On split, parent region is sticking around in oldest sequenceid to region map though not online; we don't cleanup WALs.

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156635#comment-13156635
 ] 

Hudson commented on HBASE-4789:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4853 HBASE-4789 does overzealous pruning of seqids
HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT TEMPORARILY TO 
GET TED COMMENT IN
HBASE-4853 HBASE-4789 does overzealous pruning of seqids

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java


> On split, parent region is sticking around in oldest sequenceid to region map 
> though not online; we don't cleanup WALs.
> ---
>
> Key: HBASE-4789
> URL: https://issues.apache.org/jira/browse/HBASE-4789
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 4789-v2.txt, 4789-v3.txt, 4789-v4.txt, 4789.txt
>
>
> Here is log for a particular region:
> {code}
> 2011-11-15 05:46:31,382 INFO 
> org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the 
> master to process the split for 8bbd7388262dc8cb1ce2cf4f04a7281d
> 2011-11-15 05:46:31,483 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:7003-0x1337b0b92cd000a-0x1337b0b92cd000a Attempting to 
> transition node 8bbd7388262dc8cb1ce2cf4f04a7281d from RS_ZK_REGION_SPLIT to 
> RS_ZK_REG
> ION_SPLIT
> 2011-11-15 05:46:31,484 INFO 
> org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, META 
> updated, and report to master. 
> Parent=TestTable,0862220095,1321335865649.8bbd7388262dc8cb1ce2cf4f04a7281d., 
> new regions: TestTab
> le,0862220095,1321335989689.f00c683df3182d8ef33e315f77ca539c., 
> TestTable,0892568091,1321335989689.a56ca1eff5b4401432fcba04b4e851f8.. Split 
> took 1sec
> 2011-11-15 05:46:37,705 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
> Compacting 
> hdfs://sv4r11s38:7000/hbase/TestTable/a56ca1eff5b4401432fcba04b4e851f8/info/9ce16d8fa94e4938964c04775a6fa1a7.8bbd7388262dc8cb1ce2cf4f04a7281d-
> hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9ce16d8fa94e4938964c04775a6fa1a7-top,
>  keycount=717559, bloomtype=NONE, size=711.1m
> 2011-11-15 05:46:37,705 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
> Compacting 
> hdfs://sv4r11s38:7000/hbase/TestTable/a56ca1eff5b4401432fcba04b4e851f8/info/9213f4d7ee9b4fda857a97603a001f9e.8bbd7388262dc8cb1ce2cf4f04a7281d-
> hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9213f4d7ee9b4fda857a97603a001f9e-top,
>  keycount=416691, bloomtype=NONE, size=412.9m
> 2011-11-15 05:46:53,090 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
> Compacting 
> hdfs://sv4r11s38:7000/hbase/TestTable/f00c683df3182d8ef33e315f77ca539c/info/9ce16d8fa94e4938964c04775a6fa1a7.8bbd7388262dc8cb1ce2cf4f04a7281d-
> hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9ce16d8fa94e4938964c04775a6fa1a7-bottom,
>  keycount=717559, bloomtype=NONE, size=711.1m
> 2011-11-15 05:46:53,090 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
> Compacting 
> hdfs://sv4r11s38:7000/hbase/TestTable/f00c683df3182d8ef33e315f77ca539c/info/9213f4d7ee9b4fda857a97603a001f9e.8bbd7388262dc8cb1ce2cf4f04a7281d-
> hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9213f4d7ee9b4fda857a97603a001f9e-bottom,
>  keycount=416691, bloomtype=NONE, size=412.9m
> 2011-11-15 05:48:00,690 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Found 3 hlogs to remove out of total 12; oldest outstanding sequenceid is 
> 5699 from region 8bbd7388262dc8cb1ce2cf4f04a7281d
> 2011-11-15 05:57:54,083 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: 
> Too many hlogs: logs=33, maxlogs=32; forcing flush of 1 regions(s): 
> 8bbd7388262dc8cb1ce2cf4f04a7281d
> 2011-11-15 05:57:54,083 WARN org.apache.hadoop.hbase.regionserver.LogRoller: 
> Failed to schedule flush of 8bbd7388262dc8cb1ce2cf4f04a7281dr=null, 
> requeste

[jira] [Commented] (HBASE-4772) Utility to Create StoreFiles

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156633#comment-13156633
 ] 

Hudson commented on HBASE-4772:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4772 Utility to Create StoreFiles

karthik : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java


> Utility to Create StoreFiles
> 
>
> Key: HBASE-4772
> URL: https://issues.apache.org/jira/browse/HBASE-4772
> Project: HBase
>  Issue Type: Test
>Affects Versions: 0.94.0
>Reporter: Nicolas Spiegelberg
>Assignee: Mikhail Bautin
>Priority: Minor
> Fix For: 0.94.0
>
> Attachments: HBASE-4772-B.patch, HBASE-4772.patch
>
>
> Add a tool to create a StoreFile with the specified number of key/value 
> pairs, with the specified compression and Bloom filter type.  This is useful 
> for creating HFileV1 & HFileV2 store files for testing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156636#comment-13156636
 ] 

Hudson commented on HBASE-4857:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4857  Recursive loop on KeeperException in 
AuthenticationTokenSecretManager

garyh : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/security/src/main/java/org/apache/hadoop/hbase/security/token/AuthenticationTokenSecretManager.java


> Recursive loop on KeeperException in 
> AuthenticationTokenSecretManager/ZKLeaderManager
> -
>
> Key: HBASE-4857
> URL: https://issues.apache.org/jira/browse/HBASE-4857
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.92.0, 0.94.0
>Reporter: Gary Helmling
>Assignee: Gary Helmling
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-4857.patch
>
>
> Looking through stack traces for {{TestMasterFailover}}, I see a case where 
> the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop 
> when a {{KeeperException}} is encountered:
> {noformat}
> Thread-1-EventThread" daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 
> waiting on condition [0x7f9fab376000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at java.lang.Thread.sleep(Thread.java:302)
> at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328)
> at 
> org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891)
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161)
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:154)
> at 
> org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397)
> at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435)
> at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374)
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166)
> at 
> org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
> at 
> org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167)
> at 
> org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78)
> at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
> {noformat}
> The {{KeeperException}} causes {{ZKLeaderManager}} to call 
> {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls 
> {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another 
> {{KeeperException}}, and so on...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4783) Improve RowCounter to count rows in a specific key range.

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156632#comment-13156632
 ] 

Hudson commented on HBASE-4783:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4783 Improve RowCounter to count rows in a specific key range.

nspiegelberg : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java


> Improve RowCounter to count rows in a specific key range.
> -
>
> Key: HBASE-4783
> URL: https://issues.apache.org/jira/browse/HBASE-4783
> Project: HBase
>  Issue Type: Improvement
>Reporter: Nicolas Spiegelberg
>Assignee: Nicolas Spiegelberg
>Priority: Trivial
> Fix For: 0.94.0
>
> Attachments: 4783.txt, HBASE-4783.patch
>
>
> Currently RowCounter in MR package is a very simple map only job that does a 
> full scan of a table. Enhance the utility to let the user specify a key range 
> and count the number of rows in this range. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156628#comment-13156628
 ] 

Hudson commented on HBASE-4739:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4739  Master dying while going to close a region can leave it in 
transition
   forever (Gao Jinchao)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/RegionTransitionData.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java


> Master dying while going to close a region can leave it in transition forever
> -
>
> Key: HBASE-4739
> URL: https://issues.apache.org/jira/browse/HBASE-4739
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: Jean-Daniel Cryans
>Assignee: gaojinchao
>Priority: Minor
> Fix For: 0.92.0, 0.94.0
>
> Attachments: 4739_trial2.patch, 4739_trialV3.patch, 
> HBASE-4739_Branch092.patch, HBASE-4739_Trunk.patch, 
> HBASE-4739_Trunk_V2.patch, HBASE-4739_V7.patch, HBASE-4739_trail5.patch, 
> HBASE-4739_trial.patch, HBASE-4739_trial6.patch
>
>
> I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when 
> the master died it had just created the RIT znode for a region but didn't 
> tell the RS to close it yet.
> When the master restarted it saw the znode and started printing this:
> {quote}
> 2011-11-03 00:02:49,130 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
> out:  TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. 
> state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948
> 2011-11-03 00:02:49,130 INFO 
> org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for 
> too long, this should eventually complete or the server will expire, doing 
> nothing
> {quote}
> It's never going to happen, and it's blocking balancing.
> I'm marking this as minor since I believe this situation is pretty rare 
> unless you hit other bugs while trying out stuff to root bugs out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4787) Make corePool as a configurable parameter in HTable

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156627#comment-13156627
 ] 

Hudson commented on HBASE-4787:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4787 Rename HTable thread pool

nspiegelberg : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java


> Make corePool as a configurable parameter in HTable
> ---
>
> Key: HBASE-4787
> URL: https://issues.apache.org/jira/browse/HBASE-4787
> Project: HBase
>  Issue Type: Improvement
>Reporter: Nicolas Spiegelberg
>Priority: Trivial
> Fix For: 0.94.0
>
> Attachments: HBASE-4787.patch
>
>
> Make the corePool a configurable parameter in HTable. So we can tune this 
> parameter in the config file.  While at it, change the core pool name so we 
> can distinguish it from other AppServer pools.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4785) Improve recovery time of HBase client when a region server dies.

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156629#comment-13156629
 ] 

Hudson commented on HBASE-4785:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4785 Improve recovery time of HBase client when a region server dies.

nspiegelberg : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java


> Improve recovery time of HBase client when a region server dies.
> 
>
> Key: HBASE-4785
> URL: https://issues.apache.org/jira/browse/HBASE-4785
> Project: HBase
>  Issue Type: Improvement
>Reporter: Nicolas Spiegelberg
>Assignee: Nicolas Spiegelberg
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: HBASE-4785.patch, HBASE-4785.patch
>
>
> When a region server dies, the HBase client waits until the RPC timesout 
> before learning that it needs to check META to find the new location of the 
> region. And it incurs this *timeout* cost for every region being served by 
> the dead region server. Remove this overhead by clearing the entries in cache 
> that have the dead region server as their values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156630#comment-13156630
 ] 

Hudson commented on HBASE-4308:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4308 Race between RegionOpenedHandler and AssignmentManager(Ram)

ramkrishna : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java


> Race between RegionOpenedHandler and AssignmentManager
> --
>
> Key: HBASE-4308
> URL: https://issues.apache.org/jira/browse/HBASE-4308
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.0
>
> Attachments: HBASE-4308.patch, HBASE-4308_1.patch, HBASE-4308_2.patch
>
>
> When the master is processing a ZK event for REGION_OPENED, it calls delete() 
> on the znode before it removes the node from RegionsInTransition. If the 
> notification of that delete comes back into AssignmentManager before the 
> region is removed from RIT, you see an error like:
> 2011-08-30 17:43:29,537 WARN  [main-EventThread] 
> master.AssignmentManager(861): Node deleted but still in RIT: 
> .META.,,1.1028785192 state=OPEN, ts=1314751409532, 
> server=todd-w510,55655,1314751396840
> Not certain if it causes issues, but it's a concerning log message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids

2011-11-24 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156631#comment-13156631
 ] 

Hudson commented on HBASE-4853:
---

Integrated in HBase-TRUNK-security #7 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/7/])
HBASE-4853 HBASE-4789 does overzealous pruning of seqids
HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT TEMPORARILY TO 
GET TED COMMENT IN
HBASE-4853 HBASE-4789 does overzealous pruning of seqids

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java


> HBASE-4789 does overzealous pruning of seqids
> -
>
> Key: HBASE-4853
> URL: https://issues.apache.org/jira/browse/HBASE-4853
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v10.txt, 
> 4853-v4.txt, 4853-v5.txt, 4853-v6.txt, 4853-v7.txt, 4853-v8.txt, 4853-v9.txt, 
> 4853-v9.txt, 4853.txt
>
>
> Working w/ J-D on failing replication test turned up hole in seqids made by 
> the patch over in hbase-4789.  With this patch in place we see lots of 
> instances of the suspicious: 'Last sequenceid written is empty. Deleting all 
> old hlogs'
> At a minimum, these lines need removing:
> {code}
> diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java 
> b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
> index 623edbe..a0bbe01 100644
> --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
> +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java
> @@ -1359,11 +1359,6 @@ public class HLog implements Syncable {
>// Cleaning up of lastSeqWritten is in the finally clause because we
>// don't want to confuse getOldestOutstandingSeqNum()
>this.lastSeqWritten.remove(getSnapshotName(encodedRegionName));
> -  Long l = this.lastSeqWritten.remove(encodedRegionName);
> -  if (l != null) {
> -LOG.warn("Why is there a raw encodedRegionName in lastSeqWritten? 
> name=" +
> -  Bytes.toString(encodedRegionName) + ", seqid=" + l);
> -   }
>this.cacheFlushLock.unlock();
>  }
>}
> {code}
> ... but above is no good w/o figuring why WALs are not being rotated off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-4863:
---

Attachment: D531.2.patch

mbautin updated the revision "[jira] [HBASE-4863] Make HBase Thrift server more 
configurable and add a command-line UI test".
Reviewers: JIRA, Kannan, tedyu, stack

  Updating with the most recent version. Posted a stale version at first -- 
sorry for spam.

REVISION DETAIL
  https://reviews.facebook.net/D531

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
  src/main/java/org/apache/hadoop/hbase/util/Threads.java
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
  src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServer.java
  src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java
  src/test/java/org/apache/hadoop/hbase/util/TestThreads.java


> Make HBase Thrift server more configurable and add a command-line UI test
> -
>
> Key: HBASE-4863
> URL: https://issues.apache.org/jira/browse/HBASE-4863
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: D531.1.patch, D531.2.patch
>
>
> This started as an internal hotfix where we found out that the Thrift server 
> spawned 15000 threads. To bound the thread pool size I added a custom thread 
> pool server implementation called HBaseThreadPoolServer into HBase codebase, 
> and made the following parameters configurable from both command line and as 
> config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
> Under an increasing load, the server creates new threads for every connection 
> before the pool size reaches minWorkerThreads. After that, the server puts 
> new connections into the queue and only creates a new thread when the queue 
> is full. If an attempt to create a new thread fails, the server drops 
> connection. The default TThreadPoolServer would crash in that case, but it 
> never happened because the thread pool was unbounded, so the server would 
> hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
> the client side.
> Another part of this fix is refactoring and unit testing of the command-line 
> part of the Thrift server. The logic there is sufficiently complicated, and 
> the existing ThriftServer class does not test that part at all. The new 
> TestThriftServerCmdLine test starts the Thrift server on a random port with 
> various combinations of options and talks to it through the client API from 
> another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Phabricator (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-4863:
---

Attachment: D531.1.patch

mbautin requested code review of "[jira] [HBASE-4863] Make HBase Thrift server 
more configurable and add a command-line UI test".
Reviewers: JIRA, Kannan, tedyu, stack

  This started as an internal hotfix where we found out that the Thrift server 
spawned 15000 threads. To bound the thread pool size I added a custom thread 
pool server implementation called HBaseThreadPoolServer into HBase codebase, 
and made the following parameters configurable from both command line and as 
config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
Under an increasing load, the server creates new threads for every connection 
before the pool size reaches minWorkerThreads. After that, the server puts new 
connections into the queue and only creates a new thread when the queue is 
full. If an attempt to create a new thread fails, the server drops connection. 
The default TThreadPoolServer would crash in that case, but it never happened 
because the thread pool was unbounded, so the server would hang indefinitely, 
consume a lot of memory, and cause huge latency spikes on the client side.

  Another part of this fix is refactoring and unit testing of the command-line 
part of the Thrift server. The logic there is sufficiently complicated, and the 
existing ThriftServer class does not test that part at all. The new 
TestThriftServerCmdLine test starts the Thrift server on a random port with 
various combinations of options and talks to it through the client API from 
another thread.


TEST PLAN
  Unit tests, cluster test with a Python Thrift client.
  I will post an update when I'm done with testing.

REVISION DETAIL
  https://reviews.facebook.net/D531

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java
  src/main/java/org/apache/hadoop/hbase/util/Threads.java
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
  src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServer.java
  src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java
  src/test/java/org/apache/hadoop/hbase/util/TestThreads.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/1167/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.


> Make HBase Thrift server more configurable and add a command-line UI test
> -
>
> Key: HBASE-4863
> URL: https://issues.apache.org/jira/browse/HBASE-4863
> Project: HBase
>  Issue Type: Improvement
>Reporter: Mikhail Bautin
>Assignee: Mikhail Bautin
> Attachments: D531.1.patch
>
>
> This started as an internal hotfix where we found out that the Thrift server 
> spawned 15000 threads. To bound the thread pool size I added a custom thread 
> pool server implementation called HBaseThreadPoolServer into HBase codebase, 
> and made the following parameters configurable from both command line and as 
> config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
> Under an increasing load, the server creates new threads for every connection 
> before the pool size reaches minWorkerThreads. After that, the server puts 
> new connections into the queue and only creates a new thread when the queue 
> is full. If an attempt to create a new thread fails, the server drops 
> connection. The default TThreadPoolServer would crash in that case, but it 
> never happened because the thread pool was unbounded, so the server would 
> hang indefinitely, consume a lot of memory, and cause huge latency spikes on 
> the client side.
> Another part of this fix is refactoring and unit testing of the command-line 
> part of the Thrift server. The logic there is sufficiently complicated, and 
> the existing ThriftServer class does not test that part at all. The new 
> TestThriftServerCmdLine test starts the Thrift server on a random port with 
> various combinations of options and talks to it through the client API from 
> another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test

2011-11-24 Thread Mikhail Bautin (Created) (JIRA)
Make HBase Thrift server more configurable and add a command-line UI test
-

 Key: HBASE-4863
 URL: https://issues.apache.org/jira/browse/HBASE-4863
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin


This started as an internal hotfix where we found out that the Thrift server 
spawned 15000 threads. To bound the thread pool size I added a custom thread 
pool server implementation called HBaseThreadPoolServer into HBase codebase, 
and made the following parameters configurable from both command line and as 
config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. 
Under an increasing load, the server creates new threads for every connection 
before the pool size reaches minWorkerThreads. After that, the server puts new 
connections into the queue and only creates a new thread when the queue is 
full. If an attempt to create a new thread fails, the server drops connection. 
The default TThreadPoolServer would crash in that case, but it never happened 
because the thread pool was unbounded, so the server would hang indefinitely, 
consume a lot of memory, and cause huge latency spikes on the client side.

Another part of this fix is refactoring and unit testing of the command-line 
part of the Thrift server. The logic there is sufficiently complicated, and the 
existing ThriftServer class does not test that part at all. The new 
TestThriftServerCmdLine test starts the Thrift server on a random port with 
various combinations of options and talks to it through the client API from 
another thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira