[jira] [Commented] (HBASE-5824) HRegion.incrementColumnValue is not used in trunk

2012-04-20 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258373#comment-13258373
 ] 

Jimmy Xiang commented on HBASE-5824:


Yes, this patch is for 0.96 only.

RetriesExhaustedWithDetailsException applies to batch processing only.

For single action, individual exception is used. Currently only Put is 
implicitly batched.
Should I change single Put to use RetriesExhaustedWithDetailsException too?

 HRegion.incrementColumnValue is not used in trunk
 -

 Key: HBASE-5824
 URL: https://issues.apache.org/jira/browse/HBASE-5824
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: 5824-addendum-v2.txt, hbase-5824.patch, 
 hbase-5824_v2.patch, hbase_5824.addendum


 on 0.94 a call to client.HTable#incrementColumnValue will cause 
 HRegion#incrementColumnValue.  On trunk all calls to 
 HTable.incrementColumnValue got to HRegion#increment.
 My guess is that HTable#incrementColumnValue and HTable#increment serialize 
 to the same thing over the wire so that the remote HRegionServer no longer 
 knows which htable method was called.
 To repro I checked out trunk and put a break point in 
 HRegion#incrementColumnValue and then ran TestFromClientSide.  The breakpoint 
 wasn't hit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5621) Convert admin protocol of HRegionInterface to PB

2012-04-20 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258379#comment-13258379
 ] 

Jimmy Xiang commented on HBASE-5621:


Looking into the failed unit tests.

 Convert admin protocol of HRegionInterface to PB
 

 Key: HBASE-5621
 URL: https://issues.apache.org/jira/browse/HBASE-5621
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5621_v3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5824) HRegion.incrementColumnValue is not used in trunk

2012-04-20 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258381#comment-13258381
 ] 

Jimmy Xiang commented on HBASE-5824:


@Ted, I filed HBASE-5845.  Thanks for pointing out the issue.  Good catch.

 HRegion.incrementColumnValue is not used in trunk
 -

 Key: HBASE-5824
 URL: https://issues.apache.org/jira/browse/HBASE-5824
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: 5824-addendum-v2.txt, hbase-5824.patch, 
 hbase-5824_v2.patch, hbase_5824.addendum


 on 0.94 a call to client.HTable#incrementColumnValue will cause 
 HRegion#incrementColumnValue.  On trunk all calls to 
 HTable.incrementColumnValue got to HRegion#increment.
 My guess is that HTable#incrementColumnValue and HTable#increment serialize 
 to the same thing over the wire so that the remote HRegionServer no longer 
 knows which htable method was called.
 To repro I checked out trunk and put a break point in 
 HRegion#incrementColumnValue and then ran TestFromClientSide.  The breakpoint 
 wasn't hit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5824) HRegion.incrementColumnValue is not used in trunk

2012-04-19 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257865#comment-13257865
 ] 

Jimmy Xiang commented on HBASE-5824:


If autoFlush is not enabled, Puts are most likely batched.  It is not very 
efficient to check if a batch contains only one Put, which is kind of duplicate 
some of the multiput logic.

You can say the patch is not strictly for single put.

 HRegion.incrementColumnValue is not used in trunk
 -

 Key: HBASE-5824
 URL: https://issues.apache.org/jira/browse/HBASE-5824
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Jimmy Xiang
 Attachments: hbase-5824.patch, hbase-5824_v2.patch


 on 0.94 a call to client.HTable#incrementColumnValue will cause 
 HRegion#incrementColumnValue.  On trunk all calls to 
 HTable.incrementColumnValue got to HRegion#increment.
 My guess is that HTable#incrementColumnValue and HTable#increment serialize 
 to the same thing over the wire so that the remote HRegionServer no longer 
 knows which htable method was called.
 To repro I checked out trunk and put a break point in 
 HRegion#incrementColumnValue and then ran TestFromClientSide.  The breakpoint 
 wasn't hit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5824) HRegion.incrementColumnValue is not used in trunk

2012-04-19 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257904#comment-13257904
 ] 

Jimmy Xiang commented on HBASE-5824:


I am looking into it.

 HRegion.incrementColumnValue is not used in trunk
 -

 Key: HBASE-5824
 URL: https://issues.apache.org/jira/browse/HBASE-5824
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5824.patch, hbase-5824_v2.patch


 on 0.94 a call to client.HTable#incrementColumnValue will cause 
 HRegion#incrementColumnValue.  On trunk all calls to 
 HTable.incrementColumnValue got to HRegion#increment.
 My guess is that HTable#incrementColumnValue and HTable#increment serialize 
 to the same thing over the wire so that the remote HRegionServer no longer 
 knows which htable method was called.
 To repro I checked out trunk and put a break point in 
 HRegion#incrementColumnValue and then ran TestFromClientSide.  The breakpoint 
 wasn't hit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5824) HRegion.incrementColumnValue is not used in trunk

2012-04-18 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257083#comment-13257083
 ] 

Jimmy Xiang commented on HBASE-5824:


I will add a unit test for this and fix it.

 HRegion.incrementColumnValue is not used in trunk
 -

 Key: HBASE-5824
 URL: https://issues.apache.org/jira/browse/HBASE-5824
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Jimmy Xiang

 on 0.94 a call to client.HTable#incrementColumnValue will cause 
 HRegion#incrementColumnValue.  On trunk all calls to 
 HTable.incrementColumnValue got to HRegion#increment.
 My guess is that HTable#incrementColumnValue and HTable#increment serialize 
 to the same thing over the wire so that the remote HRegionServer no longer 
 knows which htable method was called.
 To repro I checked out trunk and put a break point in 
 HRegion#incrementColumnValue and then ran TestFromClientSide.  The breakpoint 
 wasn't hit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5824) HRegion.incrementColumnValue is not used in trunk

2012-04-18 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257143#comment-13257143
 ] 

Jimmy Xiang commented on HBASE-5824:


I looked into it and it seems not a bug.  HRegion#incrementColumnValue is a 
redundant method.
HRegion#increment can do the same thing.  That's why I used HRegion#increment.  
Anything wrong with that?

As to the single puts, the reason is that the client side tries to use batch 
processing. This behaves the
same as before.  Of course, we can enhance it.  I will do it in HBASE-5621.

 HRegion.incrementColumnValue is not used in trunk
 -

 Key: HBASE-5824
 URL: https://issues.apache.org/jira/browse/HBASE-5824
 Project: HBase
  Issue Type: Bug
Reporter: Elliott Clark
Assignee: Jimmy Xiang

 on 0.94 a call to client.HTable#incrementColumnValue will cause 
 HRegion#incrementColumnValue.  On trunk all calls to 
 HTable.incrementColumnValue got to HRegion#increment.
 My guess is that HTable#incrementColumnValue and HTable#increment serialize 
 to the same thing over the wire so that the remote HRegionServer no longer 
 knows which htable method was called.
 To repro I checked out trunk and put a break point in 
 HRegion#incrementColumnValue and then ran TestFromClientSide.  The breakpoint 
 wasn't hit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB

2012-04-17 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255704#comment-13255704
 ] 

Jimmy Xiang commented on HBASE-5620:


@Stack, not every invocation will throw an exception.  In case it throws an 
exception, it should be a ServiceException for pb.  It used to be IOException.  
Without the change, for pb calls, it won't get a ServiceException in case 
something goes wrong.  It gets an undeclared exception
with the cause to be an IOE, and the upper layer doesn't know how to handle it.

The Set in Invocation is used to decide if a protocol a pb one, so 
ServiceException should be used.  I put it there because it is
used for both WritableRpcEngine and SecureRpcEngine.


 Convert the client protocol of HRegionInterface to PB
 -

 Key: HBASE-5620
 URL: https://issues.apache.org/jira/browse/HBASE-5620
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5620-sec.patch, hbase-5620_v3.patch, 
 hbase-5620_v4.patch, hbase-5620_v4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB

2012-04-16 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254979#comment-13254979
 ] 

Jimmy Xiang commented on HBASE-5620:


I did some testing with YCSB (mostly inserts).  It gave me better performance 
for the patch which was a surprise to me.
I will do some read-only testing with YCSB too.

 Convert the client protocol of HRegionInterface to PB
 -

 Key: HBASE-5620
 URL: https://issues.apache.org/jira/browse/HBASE-5620
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5620-sec.patch, hbase-5620_v3.patch, 
 hbase-5620_v4.patch, hbase-5620_v4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB

2012-04-15 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254360#comment-13254360
 ] 

Jimmy Xiang commented on HBASE-5620:


Thanks for reviewing. Both regular test suite and security test suite are green 
for me. I mean all tests in the suite.

 Convert the client protocol of HRegionInterface to PB
 -

 Key: HBASE-5620
 URL: https://issues.apache.org/jira/browse/HBASE-5620
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5620-sec.patch, hbase-5620_v3.patch, 
 hbase-5620_v4.patch, hbase-5620_v4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB

2012-04-14 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254153#comment-13254153
 ] 

Jimmy Xiang commented on HBASE-5620:


@Stack, thanks.

@Ted, I am looking into it now.

 Convert the client protocol of HRegionInterface to PB
 -

 Key: HBASE-5620
 URL: https://issues.apache.org/jira/browse/HBASE-5620
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5620_v3.patch, hbase-5620_v4.patch, 
 hbase-5620_v4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB

2012-04-14 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254216#comment-13254216
 ] 

Jimmy Xiang commented on HBASE-5620:


TestForceCacheImportantBlocks is green for me.

 Convert the client protocol of HRegionInterface to PB
 -

 Key: HBASE-5620
 URL: https://issues.apache.org/jira/browse/HBASE-5620
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5620-sec.patch, hbase-5620_v3.patch, 
 hbase-5620_v4.patch, hbase-5620_v4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB

2012-04-13 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253377#comment-13253377
 ] 

Jimmy Xiang commented on HBASE-5620:


I will take a look at the test failures.

 Convert the client protocol of HRegionInterface to PB
 -

 Key: HBASE-5620
 URL: https://issues.apache.org/jira/browse/HBASE-5620
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5620_v3.patch, hbase-5620_v4.patch, 
 hbase-5620_v4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB

2012-04-13 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253538#comment-13253538
 ] 

Jimmy Xiang commented on HBASE-5620:


TestWALPlayer passed for me.  I didn't have the latest from trunk?

 Convert the client protocol of HRegionInterface to PB
 -

 Key: HBASE-5620
 URL: https://issues.apache.org/jira/browse/HBASE-5620
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5620_v3.patch, hbase-5620_v4.patch, 
 hbase-5620_v4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB

2012-04-13 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253799#comment-13253799
 ] 

Jimmy Xiang commented on HBASE-5620:


@Stack, I will move ClientProtocol.java and AdminProtocol.java to top level in 
HBASE-5621 since they are common.  I added HBASE-5785 to track the unit test 
issue.

@Ted, can I check the licenses without doing a release build?

 Convert the client protocol of HRegionInterface to PB
 -

 Key: HBASE-5620
 URL: https://issues.apache.org/jira/browse/HBASE-5620
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5620_v3.patch, hbase-5620_v4.patch, 
 hbase-5620_v4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB

2012-04-13 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253840#comment-13253840
 ] 

Jimmy Xiang commented on HBASE-5620:


I ran Apache Rat check like mvn apache-rat:check, and it is ok.

 Convert the client protocol of HRegionInterface to PB
 -

 Key: HBASE-5620
 URL: https://issues.apache.org/jira/browse/HBASE-5620
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5620_v3.patch, hbase-5620_v4.patch, 
 hbase-5620_v4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5620) Convert the client protocol of HRegionInterface to PB

2012-04-13 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253852#comment-13253852
 ] 

Jimmy Xiang commented on HBASE-5620:


@Stack, thanks a lot!

I moved them to top-level in HBase-5621 and posted a review request. Could you 
please review?
I am ok to move them to client package.

 Convert the client protocol of HRegionInterface to PB
 -

 Key: HBASE-5620
 URL: https://issues.apache.org/jira/browse/HBASE-5620
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5620_v3.patch, hbase-5620_v4.patch, 
 hbase-5620_v4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5777) MiniHBaseCluster cannot start multiple region servers

2012-04-12 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252899#comment-13252899
 ] 

Jimmy Xiang commented on HBASE-5777:


I see.  When I run unit tests in eclipse, the hbase-site.xml at src/test is not 
used.
Maybe we can disable the UI in MiniHBaseCluster too, how about that?

 MiniHBaseCluster cannot start multiple region servers
 -

 Key: HBASE-5777
 URL: https://issues.apache.org/jira/browse/HBASE-5777
 Project: HBase
  Issue Type: Test
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: hbase-5777.patch


 MiniHBaseCluster can try to start multiple region servers.  But all of them 
 except one will die in putting up the web UI
 because of BindException since HConstants.REGIONSERVER_INFO_PORT_AUTO is set 
 to false by default.
 This issue will make many unit tests depending on multiple region servers 
 flaky, such as TestAdmin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5740) Compaction interruption may be due to balacing

2012-04-09 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249951#comment-13249951
 ] 

Jimmy Xiang commented on HBASE-5740:


@JD, any comments on the second patch?

 Compaction interruption may be due to balacing
 --

 Key: HBASE-5740
 URL: https://issues.apache.org/jira/browse/HBASE-5740
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Trivial
 Fix For: 0.96.0

 Attachments: hbase-5740.patch, hbase-5740_v2.patch


 Currently, the log shows 
 Aborting compaction of store LOG in region  because user requested stop.
 But it is actually because of balancing.
 Currently, there is no way to figure out who closed the region.  So it is 
 better to change the message to say it is because of user, or balancing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5740) Compaction interruption may be due to balacing

2012-04-09 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250267#comment-13250267
 ] 

Jimmy Xiang commented on HBASE-5740:


@Stack, I am fine with the generic message. Please make the change on commit.  
Thanks a lot.
We don't know for sure who interrupted it anyway for now.

 Compaction interruption may be due to balacing
 --

 Key: HBASE-5740
 URL: https://issues.apache.org/jira/browse/HBASE-5740
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Trivial
 Fix For: 0.96.0

 Attachments: hbase-5740.patch, hbase-5740_v2.patch


 Currently, the log shows 
 Aborting compaction of store LOG in region  because user requested stop.
 But it is actually because of balancing.
 Currently, there is no way to figure out who closed the region.  So it is 
 better to change the message to say it is because of user, or balancing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5734) Change hbck sideline root

2012-04-06 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248511#comment-13248511
 ] 

Jimmy Xiang commented on HBASE-5734:


It is nice to expose it as an argument. However, it offers not too much value 
since we don't expect hbck to be ran all the time. They can rename it 
afterwards if they really want.

We already have lots of arguments.


 Change hbck sideline root
 -

 Key: HBASE-5734
 URL: https://issues.apache.org/jira/browse/HBASE-5734
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Affects Versions: 0.94.0, 0.96.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Trivial
 Fix For: 0.96.0

 Attachments: hbase-5734.patch


 Currently hbck sideline root is the root which can run into permission issue. 
 We can change it to /hbck

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5740) Compaction interruption may be due to balacing

2012-04-06 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249045#comment-13249045
 ] 

Jimmy Xiang commented on HBASE-5740:


Add a new patch, and not saying who trigged it since we don't know for now.

 Compaction interruption may be due to balacing
 --

 Key: HBASE-5740
 URL: https://issues.apache.org/jira/browse/HBASE-5740
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Trivial
 Fix For: 0.96.0

 Attachments: hbase-5740.patch, hbase-5740_v2.patch


 Currently, the log shows 
 Aborting compaction of store LOG in region  because user requested stop.
 But it is actually because of balancing.
 Currently, there is no way to figure out who closed the region.  So it is 
 better to change the message to say it is because of user, or balancing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost

2012-04-03 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245763#comment-13245763
 ] 

Jimmy Xiang commented on HBASE-5606:


It is ok with me. Hopefully, there is no other place.

 SplitLogManger async delete node hangs log splitting when ZK connection is 
 lost 
 

 Key: HBASE-5606
 URL: https://issues.apache.org/jira/browse/HBASE-5606
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Gopinathan A
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.2

 Attachments: 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch


 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All tasks are failed due to ZK connection lost, so the all the tasks were 
 deleted asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. The asynchronously deletion in step 2 finally happened for new task
 5. This made the SplitLogManger in hanging state.
 This leads to .META. region not assigened for long time
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 
 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 
 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 
 19:34:31,196 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 
 19:34:32,497 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5443) Add PB-based calls to HRegionInterface

2012-04-01 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243954#comment-13243954
 ] 

Jimmy Xiang commented on HBASE-5443:


The main reason is that the HBase writable RPC already supports pb.  Hadoop 
uses pb too.

 Add PB-based calls to HRegionInterface
 --

 Key: HBASE-5443
 URL: https://issues.apache.org/jira/browse/HBASE-5443
 Project: HBase
  Issue Type: Task
  Components: ipc, master, migration, regionserver
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: region_java-proto-mapping.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5619) Create PB protocols for HRegionInterface

2012-03-30 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13242720#comment-13242720
 ] 

Jimmy Xiang commented on HBASE-5619:


@Stack, thanks!

 Create PB protocols for HRegionInterface
 

 Key: HBASE-5619
 URL: https://issues.apache.org/jira/browse/HBASE-5619
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: 5619v6.txt, 5619v6.txt, hbase-5619.patch, 
 hbase-5619_v3.patch, hbase-5619_v4.patch, hbase-5619_v5.patch


 Subtask of HBase-5443, separate HRegionInterface into admin protocol and 
 client protocol, create the PB protocol buffer files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5619) Create PB protocols for HRegionInterface

2012-03-29 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241374#comment-13241374
 ] 

Jimmy Xiang commented on HBASE-5619:


@Stack, could you please commit this patch?  I do have some changes to pb 
files. But I'd like to address them in HBASE-5620.

Thanks.


 Create PB protocols for HRegionInterface
 

 Key: HBASE-5619
 URL: https://issues.apache.org/jira/browse/HBASE-5619
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5619.patch, hbase-5619_v3.patch, 
 hbase-5619_v4.patch


 Subtask of HBase-5443, separate HRegionInterface into admin protocol and 
 client protocol, create the PB protocol buffer files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5619) Create PB protocols for HRegionInterface

2012-03-29 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241622#comment-13241622
 ] 

Jimmy Xiang commented on HBASE-5619:


@Stack, could you please install protoc and give it a try again?
From now on, we need protoc to compile. :)

 Create PB protocols for HRegionInterface
 

 Key: HBASE-5619
 URL: https://issues.apache.org/jira/browse/HBASE-5619
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5619.patch, hbase-5619_v3.patch, 
 hbase-5619_v4.patch


 Subtask of HBase-5443, separate HRegionInterface into admin protocol and 
 client protocol, create the PB protocol buffer files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5619) Create PB protocols for HRegionInterface

2012-03-29 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241819#comment-13241819
 ] 

Jimmy Xiang commented on HBASE-5619:


@Stack, so far, I could not find a good protoc maven plugin. I don't remember I 
tried to install it on my Ubuntu.
That's the download site for protobuf compiler: 
http://code.google.com/p/protobuf/downloads/list
But for Linux, I think it is easy to install with rpm/apt-get.

 Create PB protocols for HRegionInterface
 

 Key: HBASE-5619
 URL: https://issues.apache.org/jira/browse/HBASE-5619
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5619.patch, hbase-5619_v3.patch, 
 hbase-5619_v4.patch


 Subtask of HBase-5443, separate HRegionInterface into admin protocol and 
 client protocol, create the PB protocol buffer files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5667) RegexStringComparator supports java.util.regex.Pattern flags

2012-03-29 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241861#comment-13241861
 ] 

Jimmy Xiang commented on HBASE-5667:


@Stack, I prefer to change them to pb so we should not bother to make them 
VersionedWritables for now.
We have lots of filters.  We need to abstract them out and have a generic way 
to define them in pb.


 RegexStringComparator supports java.util.regex.Pattern flags
 

 Key: HBASE-5667
 URL: https://issues.apache.org/jira/browse/HBASE-5667
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: David Arthur
Priority: Minor
 Attachments: HBASE-5667.diff


 * Add constructor that takes in a Pattern
 * Add Pattern's flags to Writable fields, and actually use them when 
 recomposing the Filter

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5667) RegexStringComparator supports java.util.regex.Pattern flags

2012-03-29 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241870#comment-13241870
 ] 

Jimmy Xiang commented on HBASE-5667:


For this patch, it changes the constructor of RegexStringComparator.  A Pattern 
is hard to be pb'd.  Can we specify the flags in a different way,
for example, using string, and/or some primitive parameters?

 RegexStringComparator supports java.util.regex.Pattern flags
 

 Key: HBASE-5667
 URL: https://issues.apache.org/jira/browse/HBASE-5667
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: David Arthur
Priority: Minor
 Attachments: HBASE-5667.diff


 * Add constructor that takes in a Pattern
 * Add Pattern's flags to Writable fields, and actually use them when 
 recomposing the Filter

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5619) Create PB protocols for HRegionInterface

2012-03-29 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241880#comment-13241880
 ] 

Jimmy Xiang commented on HBASE-5619:


But for proto files, other projects depend on protoc, for example HADOOP/HDFS. 
We are moving towards pb, protoc dependency should be fine.
I can try to setup a temp protoc dynamically.

 Create PB protocols for HRegionInterface
 

 Key: HBASE-5619
 URL: https://issues.apache.org/jira/browse/HBASE-5619
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5619.patch, hbase-5619_v3.patch, 
 hbase-5619_v4.patch


 Subtask of HBase-5443, separate HRegionInterface into admin protocol and 
 client protocol, create the PB protocol buffer files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5667) RegexStringComparator supports java.util.regex.Pattern flags

2012-03-29 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241914#comment-13241914
 ] 

Jimmy Xiang commented on HBASE-5667:


Pattern is fine if we can get it.

 RegexStringComparator supports java.util.regex.Pattern flags
 

 Key: HBASE-5667
 URL: https://issues.apache.org/jira/browse/HBASE-5667
 Project: HBase
  Issue Type: Improvement
  Components: filters
Reporter: David Arthur
Priority: Minor
 Attachments: HBASE-5667.diff


 * Add constructor that takes in a Pattern
 * Add Pattern's flags to Writable fields, and actually use them when 
 recomposing the Filter

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5619) Create PB protocols for HRegionInterface

2012-03-29 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241954#comment-13241954
 ] 

Jimmy Xiang commented on HBASE-5619:


That's what do for thrift now, not avro.

 Create PB protocols for HRegionInterface
 

 Key: HBASE-5619
 URL: https://issues.apache.org/jira/browse/HBASE-5619
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: hbase-5619.patch, hbase-5619_v3.patch, 
 hbase-5619_v4.patch


 Subtask of HBase-5443, separate HRegionInterface into admin protocol and 
 client protocol, create the PB protocol buffer files

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost

2012-03-26 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238811#comment-13238811
 ] 

Jimmy Xiang commented on HBASE-5606:


@Prakash,  could there be other places which failed delete can cause this issue?

Is it a cleaner fix to change async delete to sync delete?  With sync delete, 
we can
avoid all these havoc racing problems, and the retry will get a fresh start 
each time.

 SplitLogManger async delete node hangs log splitting when ZK connection is 
 lost 
 

 Key: HBASE-5606
 URL: https://issues.apache.org/jira/browse/HBASE-5606
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Gopinathan A
Priority: Critical
 Fix For: 0.92.2

 Attachments: 
 0001-HBASE-5606-SplitLogManger-async-delete-node-hangs-lo.patch, 5606.txt


 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All tasks are failed due to ZK connection lost, so the all the tasks were 
 deleted asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. The asynchronously deletion in step 2 finally happened for new task
 5. This made the SplitLogManger in hanging state.
 This leads to .META. region not assigened for long time
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 
 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 
 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 
 19:34:31,196 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 
 19:34:32,497 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5606) SplitLogManger async delete node hangs log splitting when ZK connection is lost

2012-03-24 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237647#comment-13237647
 ] 

Jimmy Xiang commented on HBASE-5606:


This is similar issue as HBASE-5081, right?

Will my original fix proposed for HBASE-5081 help: don't retry distributed log 
splitting before tasks are actually deleted?
We can abort the master after several retry to delete the tasks.

 SplitLogManger async delete node hangs log splitting when ZK connection is 
 lost 
 

 Key: HBASE-5606
 URL: https://issues.apache.org/jira/browse/HBASE-5606
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0
Reporter: Gopinathan A
Priority: Critical
 Fix For: 0.92.2

 Attachments: 5606.txt


 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All tasks are failed due to ZK connection lost, so the all the tasks were 
 deleted asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. The asynchronously deletion in step 2 finally happened for new task
 5. This made the SplitLogManger in hanging state.
 This leads to .META. region not assigened for long time
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(55413,79):2012-03-14 
 19:28:47,932 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89303,79):2012-03-14 
 19:34:32,387 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: put up 
 splitlog task at znode 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}
 {noformat}
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(80417,99):2012-03-14 
 19:34:31,196 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 hbase-root-master-HOST-192-168-47-204.log.2012-03-14(89456,99):2012-03-14 
 19:34:32,497 DEBUG 
 org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback: deleted 
 /hbase/splitlog/hdfs%3A%2F%2F192.168.47.205%3A9000%2Fhbase%2F.logs%2Flinux-114.site%2C60020%2C1331720381665-splitting%2Flinux-114.site%252C60020%252C1331720381665.1331752316170
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5443) Add PB-based calls to HRegionInterface

2012-03-22 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235780#comment-13235780
 ] 

Jimmy Xiang commented on HBASE-5443:


I have done some code changes, and some tests failed.  It is very hard to look 
into them.  So I'd like to break it into small pieces and tag them one by one.

 Add PB-based calls to HRegionInterface
 --

 Key: HBASE-5443
 URL: https://issues.apache.org/jira/browse/HBASE-5443
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: region_java-proto-mapping.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5443) Add PB-based calls to HRegionInterface

2012-03-02 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221155#comment-13221155
 ] 

Jimmy Xiang commented on HBASE-5443:


I updated the review with new diff, which incorporated the feedbacks from all 
reviewers.  Thanks a lot for review.
 

 Add PB-based calls to HRegionInterface
 --

 Key: HBASE-5443
 URL: https://issues.apache.org/jira/browse/HBASE-5443
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: region_java-proto-mapping.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs

2012-03-02 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221166#comment-13221166
 ] 

Jimmy Xiang commented on HBASE-5451:


I hope we can.  I know the RPC won't be backward compatible.  How about the 
client code?  We definitely won't break any existing client applications, right?

 Switch RPC call envelope/headers to PBs
 ---

 Key: HBASE-5451
 URL: https://issues.apache.org/jira/browse/HBASE-5451
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: rpc-proto.2.txt, rpc-proto.3.txt, rpc-proto.patch.1_2




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5451) Switch RPC call envelope/headers to PBs

2012-03-01 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220249#comment-13220249
 ] 

Jimmy Xiang commented on HBASE-5451:


I did a quick review last night. Looks ok with me.
For the pom change, we have the same change. So it should be fine.

For me, I put the generated files under org.apache.hadoop.hbase.protobuf.
Should I put them under org.apache.hadoop.hbase.ipc.protobuf too?

 Switch RPC call envelope/headers to PBs
 ---

 Key: HBASE-5451
 URL: https://issues.apache.org/jira/browse/HBASE-5451
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Affects Versions: 0.94.0
Reporter: Todd Lipcon
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: rpc-proto.2.txt, rpc-proto.3.txt, rpc-proto.patch.1_2




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3909) Add dynamic config

2012-03-01 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220404#comment-13220404
 ] 

Jimmy Xiang commented on HBASE-3909:


If they lose them, it could be very bad. It may be too later when someone see 
something weird, then realize their configs are gone. I think it is safer to 
persist them somewhere.

 Add dynamic config
 --

 Key: HBASE-3909
 URL: https://issues.apache.org/jira/browse/HBASE-3909
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.96.0


 I'm sure this issue exists already, at least as part of the discussion around 
 making online schema edits possible, but no hard this having its own issue.  
 Ted started a conversation on this topic up on dev and Todd suggested we 
 lookd at how Hadoop did it over in HADOOP-7001

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-27 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217364#comment-13217364
 ] 

Jimmy Xiang commented on HBASE-5270:


@Stack, I agree.  I think we should reuse the existing exception if we can.

 Handle potential data loss due to concurrent processing of processFaileOver 
 and ServerShutdownHandler
 -

 Key: HBASE-5270
 URL: https://issues.apache.org/jira/browse/HBASE-5270
 Project: HBase
  Issue Type: Sub-task
  Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
 Fix For: 0.92.1, 0.94.0

 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 
 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, 
 hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, sampletest.txt


 This JIRA continues the effort from HBASE-5179. Starting with Stack's 
 comments about patches for 0.92 and TRUNK:
 Reviewing 0.92v17
 isDeadServerInProgress is a new public method in ServerManager but it does 
 not seem to be used anywhere.
 Does isDeadRootServerInProgress need to be public? Ditto for meta version.
 This method param names are not right 'definitiveRootServer'; what is meant 
 by definitive? Do they need this qualifier?
 Is there anything in place to stop us expiring a server twice if its carrying 
 root and meta?
 What is difference between asking assignment manager isCarryingRoot and this 
 variable that is passed in? Should be doc'd at least. Ditto for meta.
 I think I've asked for this a few times - onlineServers needs to be 
 explained... either in javadoc or in comment. This is the param passed into 
 joinCluster. How does it arise? I think I know but am unsure. God love the 
 poor noob that comes awandering this code trying to make sense of it all.
 It looks like we get the list by trawling zk for regionserver znodes that 
 have not checked in. Don't we do this operation earlier in master setup? Are 
 we doing it again here?
 Though distributed split log is configured, we will do in master single 
 process splitting under some conditions with this patch. Its not explained in 
 code why we would do this. Why do we think master log splitting 'high 
 priority' when it could very well be slower. Should we only go this route if 
 distributed splitting is not going on. Do we know if concurrent distributed 
 log splitting and master splitting works?
 Why would we have dead servers in progress here in master startup? Because a 
 servershutdownhandler fired?
 This patch is different to the patch for 0.90. Should go into trunk first 
 with tests, then 0.92. Should it be in this issue? This issue is really hard 
 to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
 this trunk patch?
 This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5443) Add PB-based calls to HRegionInterface

2012-02-27 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217442#comment-13217442
 ] 

Jimmy Xiang commented on HBASE-5443:


We can still support multi(MultiAction).  Should we still support it in the RPC 
layer?
Can we put some logic in the client side, like aggregating the actions based
on region, action type (put/delete/get), and so on?

 Add PB-based calls to HRegionInterface
 --

 Key: HBASE-5443
 URL: https://issues.apache.org/jira/browse/HBASE-5443
 Project: HBase
  Issue Type: Sub-task
  Components: ipc, master, migration, regionserver
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Fix For: 0.96.0

 Attachments: region_java-proto-mapping.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3909) Add dynamic config

2012-02-26 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216758#comment-13216758
 ] 

Jimmy Xiang commented on HBASE-3909:


@Stack,  we don't have to poll fs to find changes. We can just put the 
lastmodifieddate of the file in ZK.  Once the last modified date is changed, we 
can load the file again.

When a new regionserver joins a cluster, it should always try to check if any 
configuration is changed based on the configuration file last modified
date, which is kind of the version number of the file.


 Add dynamic config
 --

 Key: HBASE-3909
 URL: https://issues.apache.org/jira/browse/HBASE-3909
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.94.0


 I'm sure this issue exists already, at least as part of the discussion around 
 making online schema edits possible, but no hard this having its own issue.  
 Ted started a conversation on this topic up on dev and Todd suggested we 
 lookd at how Hadoop did it over in HADOOP-7001

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3909) Add dynamic config

2012-02-24 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216095#comment-13216095
 ] 

Jimmy Xiang commented on HBASE-3909:


Can we put dynamic configuration somewhere in the HDFS, for example, some file 
under hbase.rootdir?

We can put static configuration in hbase-site.xml, and dynamic configuration in 
a file under hbase.rootdir.

We can also enhance hbase shell or master UI to view/change those dynamic 
configurations.


 Add dynamic config
 --

 Key: HBASE-3909
 URL: https://issues.apache.org/jira/browse/HBASE-3909
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.94.0


 I'm sure this issue exists already, at least as part of the discussion around 
 making online schema edits possible, but no hard this having its own issue.  
 Ted started a conversation on this topic up on dev and Todd suggested we 
 lookd at how Hadoop did it over in HADOOP-7001

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3909) Add dynamic config

2012-02-24 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216105#comment-13216105
 ] 

Jimmy Xiang commented on HBASE-3909:


For these dynamic configurations, we can cache them in memory. In the meantime, 
create a separate thread to re-load the cache periodically.
So it is apparent to the configuration reader.

 Add dynamic config
 --

 Key: HBASE-3909
 URL: https://issues.apache.org/jira/browse/HBASE-3909
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.94.0


 I'm sure this issue exists already, at least as part of the discussion around 
 making online schema edits possible, but no hard this having its own issue.  
 Ted started a conversation on this topic up on dev and Todd suggested we 
 lookd at how Hadoop did it over in HADOOP-7001

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-24 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216114#comment-13216114
 ] 

Jimmy Xiang commented on HBASE-5270:


Instead of introducing safe mode, can we add something to the RPC server and 
don't allow it to sever traffic before the actual server is ready, for example, 
fully initialized? 

 Handle potential data loss due to concurrent processing of processFaileOver 
 and ServerShutdownHandler
 -

 Key: HBASE-5270
 URL: https://issues.apache.org/jira/browse/HBASE-5270
 Project: HBase
  Issue Type: Sub-task
  Components: master
Reporter: Zhihong Yu
Assignee: chunhui shen
 Fix For: 0.92.1, 0.94.0

 Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 
 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, 
 hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, sampletest.txt


 This JIRA continues the effort from HBASE-5179. Starting with Stack's 
 comments about patches for 0.92 and TRUNK:
 Reviewing 0.92v17
 isDeadServerInProgress is a new public method in ServerManager but it does 
 not seem to be used anywhere.
 Does isDeadRootServerInProgress need to be public? Ditto for meta version.
 This method param names are not right 'definitiveRootServer'; what is meant 
 by definitive? Do they need this qualifier?
 Is there anything in place to stop us expiring a server twice if its carrying 
 root and meta?
 What is difference between asking assignment manager isCarryingRoot and this 
 variable that is passed in? Should be doc'd at least. Ditto for meta.
 I think I've asked for this a few times - onlineServers needs to be 
 explained... either in javadoc or in comment. This is the param passed into 
 joinCluster. How does it arise? I think I know but am unsure. God love the 
 poor noob that comes awandering this code trying to make sense of it all.
 It looks like we get the list by trawling zk for regionserver znodes that 
 have not checked in. Don't we do this operation earlier in master setup? Are 
 we doing it again here?
 Though distributed split log is configured, we will do in master single 
 process splitting under some conditions with this patch. Its not explained in 
 code why we would do this. Why do we think master log splitting 'high 
 priority' when it could very well be slower. Should we only go this route if 
 distributed splitting is not going on. Do we know if concurrent distributed 
 log splitting and master splitting works?
 Why would we have dead servers in progress here in master startup? Because a 
 servershutdownhandler fired?
 This patch is different to the patch for 0.90. Should go into trunk first 
 with tests, then 0.92. Should it be in this issue? This issue is really hard 
 to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
 this trunk patch?
 This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3909) Add dynamic config

2012-02-24 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216131#comment-13216131
 ] 

Jimmy Xiang commented on HBASE-3909:


Yes, I meant transparent to configuration reader.  My assumption is that the 
change doesn't have to take effect right away.  Some delay is fine.

If we really want to use ZK, we can use a central file as persistence.



 Add dynamic config
 --

 Key: HBASE-3909
 URL: https://issues.apache.org/jira/browse/HBASE-3909
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.94.0


 I'm sure this issue exists already, at least as part of the discussion around 
 making online schema edits possible, but no hard this having its own issue.  
 Ted started a conversation on this topic up on dev and Todd suggested we 
 lookd at how Hadoop did it over in HADOOP-7001

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5472) LoadIncrementalHFiles loops forever if the target table misses a CF

2012-02-24 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216141#comment-13216141
 ] 

Jimmy Xiang commented on HBASE-5472:


In such a case, should the tool ignore the missing column family, or just error 
out? 

 LoadIncrementalHFiles loops forever if the target table misses a CF
 ---

 Key: HBASE-5472
 URL: https://issues.apache.org/jira/browse/HBASE-5472
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Reporter: Lars Hofhansl
Priority: Minor

 I have some HFiles for two column families 'y','z', but I specified a target 
 table that only has CF 'y'.
 I see the following repeated forever.
 ...
 12/02/23 22:57:37 WARN mapreduce.LoadIncrementalHFiles: Attempt to bulk load 
 region containing  into table z with files [family:y 
 path:hdfs://bunnypig:9000/bulk/z2/y/bd6f1c3cc8b443fc9e9e5fddcdaa3b09, 
 family:z 
 path:hdfs://bunnypig:9000/bulk/z2/z/38f12fdbb7de40e8bf0e6489ef34365d] failed. 
  This is recoverable and they will be retried.
 12/02/23 22:57:37 DEBUG client.MetaScanner: Scanning .META. starting at 
 row=z,,00 for max=2147483647 rows using 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@7b7a4989
 12/02/23 22:57:37 INFO mapreduce.LoadIncrementalHFiles: Split occured while 
 grouping HFiles, retry attempt 1596 with 2 files remaining to group or split
 12/02/23 22:57:37 INFO mapreduce.LoadIncrementalHFiles: Trying to load 
 hfile=hdfs://bunnypig:9000/bulk/z2/y/bd6f1c3cc8b443fc9e9e5fddcdaa3b09 first=r 
 last=r
 12/02/23 22:57:37 INFO mapreduce.LoadIncrementalHFiles: Trying to load 
 hfile=hdfs://bunnypig:9000/bulk/z2/z/38f12fdbb7de40e8bf0e6489ef34365d first=r 
 last=r
 12/02/23 22:57:37 DEBUG mapreduce.LoadIncrementalHFiles: Going to connect to 
 server region=z,,1330066309814.d5fa76a38c9565f614755e34eacf8316., 
 hostname=localhost, port=60020 for row 
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4403) Adopt interface stability/audience classifications from Hadoop

2012-02-22 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214146#comment-13214146
 ] 

Jimmy Xiang commented on HBASE-4403:


It should use hbase-4403.patch instead of hbase-4403-interface_v3.txt. :)  Just 
retried.

 Adopt interface stability/audience classifications from Hadoop
 --

 Key: HBASE-4403
 URL: https://issues.apache.org/jira/browse/HBASE-4403
 Project: HBase
  Issue Type: Task
Affects Versions: 0.90.5, 0.92.0
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Fix For: 0.94.0

 Attachments: hbase-4403-interface.txt, hbase-4403-interface_v2.txt, 
 hbase-4403-interface_v3.txt, hbase-4403-nowhere-near-done.txt, 
 hbase-4403.patch, hbase-4403.patch


 As HBase gets more widely used, we need to be more explicit about which APIs 
 are stable and not expected to break between versions, which APIs are still 
 evolving, etc. We also have many public classes that are really internal to 
 the RS or Master and not meant to be used by users. Hadoop has adopted a 
 classification scheme for audience (public, private, or limited-private) as 
 well as stability (stable, evolving, unstable). I think we should copy these 
 annotations to HBase and start to classify our public classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4403) Adopt interface stability/audience classifications from Hadoop

2012-02-17 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13210666#comment-13210666
 ] 

Jimmy Xiang commented on HBASE-4403:


Sounds great.

 Adopt interface stability/audience classifications from Hadoop
 --

 Key: HBASE-4403
 URL: https://issues.apache.org/jira/browse/HBASE-4403
 Project: HBase
  Issue Type: Task
Affects Versions: 0.90.5, 0.92.0
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Attachments: hbase-4403-interface.txt, hbase-4403-interface_v2.txt, 
 hbase-4403-nowhere-near-done.txt


 As HBase gets more widely used, we need to be more explicit about which APIs 
 are stable and not expected to break between versions, which APIs are still 
 evolving, etc. We also have many public classes that are really internal to 
 the RS or Master and not meant to be used by users. Hadoop has adopted a 
 classification scheme for audience (public, private, or limited-private) as 
 well as stability (stable, evolving, unstable). I think we should copy these 
 annotations to HBase and start to classify our public classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4403) Adopt interface stability/audience classifications from Hadoop

2012-02-16 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209738#comment-13209738
 ] 

Jimmy Xiang commented on HBASE-4403:


@Stack, thanks a lot for review.  I will incorporate the changes to the next 
version.
How about those coprocessor and rest related classes?

As the classification definition, HADOOP-5073 has some info and background.
Todd has a short summary which is very good:

{quote}
if it's Private, we can change it (and dont' need a
stability mark). If it's public but unstable, we can change it. If
it's public/evolving, we're allowed to change it but should try not
to. If it's public and stable we can't change it without a deprecation
path or with a GREAT reason.
{quote}


 Adopt interface stability/audience classifications from Hadoop
 --

 Key: HBASE-4403
 URL: https://issues.apache.org/jira/browse/HBASE-4403
 Project: HBase
  Issue Type: Task
Affects Versions: 0.90.5, 0.92.0
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Attachments: hbase-4403-interface.txt, 
 hbase-4403-nowhere-near-done.txt


 As HBase gets more widely used, we need to be more explicit about which APIs 
 are stable and not expected to break between versions, which APIs are still 
 evolving, etc. We also have many public classes that are really internal to 
 the RS or Master and not meant to be used by users. Hadoop has adopted a 
 classification scheme for audience (public, private, or limited-private) as 
 well as stability (stable, evolving, unstable). I think we should copy these 
 annotations to HBase and start to classify our public classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4403) Adopt interface stability/audience classifications from Hadoop

2012-02-16 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209772#comment-13209772
 ] 

Jimmy Xiang commented on HBASE-4403:


Cool, thanks.  Please add the definitions to the book in a new one.  This one 
may take a while.

 Adopt interface stability/audience classifications from Hadoop
 --

 Key: HBASE-4403
 URL: https://issues.apache.org/jira/browse/HBASE-4403
 Project: HBase
  Issue Type: Task
Affects Versions: 0.90.5, 0.92.0
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Attachments: hbase-4403-interface.txt, 
 hbase-4403-nowhere-near-done.txt


 As HBase gets more widely used, we need to be more explicit about which APIs 
 are stable and not expected to break between versions, which APIs are still 
 evolving, etc. We also have many public classes that are really internal to 
 the RS or Master and not meant to be used by users. Hadoop has adopted a 
 classification scheme for audience (public, private, or limited-private) as 
 well as stability (stable, evolving, unstable). I think we should copy these 
 annotations to HBase and start to classify our public classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4403) Adopt interface stability/audience classifications from Hadoop

2012-02-16 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209901#comment-13209901
 ] 

Jimmy Xiang commented on HBASE-4403:


Yes, I will do that in a separate jira.

 Adopt interface stability/audience classifications from Hadoop
 --

 Key: HBASE-4403
 URL: https://issues.apache.org/jira/browse/HBASE-4403
 Project: HBase
  Issue Type: Task
Affects Versions: 0.90.5, 0.92.0
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Attachments: hbase-4403-interface.txt, 
 hbase-4403-nowhere-near-done.txt


 As HBase gets more widely used, we need to be more explicit about which APIs 
 are stable and not expected to break between versions, which APIs are still 
 evolving, etc. We also have many public classes that are really internal to 
 the RS or Master and not meant to be used by users. Hadoop has adopted a 
 classification scheme for audience (public, private, or limited-private) as 
 well as stability (stable, evolving, unstable). I think we should copy these 
 annotations to HBase and start to classify our public classes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5394) Add ability to include Protobufs in HbaseObjectWritable

2012-02-15 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208584#comment-13208584
 ] 

Jimmy Xiang commented on HBASE-5394:


These tests are passed on my local box.

 Add ability to include Protobufs in HbaseObjectWritable
 ---

 Key: HBASE-5394
 URL: https://issues.apache.org/jira/browse/HBASE-5394
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0
Reporter: Zhihong Yu
Assignee: Jimmy Xiang
 Fix For: 0.94.0

 Attachments: hbase-5394.txt


 This is a port of HADOOP-7379
 This is to add the cases to HbaseObjectWritable to handle subclasses of 
 Message, the superclass of codegenned protobufs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5398) HBase shell disable_all/enable_all/drop_all promp wrong tables for confirmation

2012-02-14 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207914#comment-13207914
 ] 

Jimmy Xiang commented on HBASE-5398:


Yes, it takes a regex pattern and disable all matched tables.  Joey did this 
feature HBASE-3506.

 HBase shell disable_all/enable_all/drop_all promp wrong tables for 
 confirmation
 ---

 Key: HBASE-5398
 URL: https://issues.apache.org/jira/browse/HBASE-5398
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.0, 0.92.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.94.0, 0.92.0

 Attachments: hbase-5398.patch


 When using hbase shell to disable_all/enable_all/drop_all tables, the tables 
 prompted for confirmation are wrong.
 For example, disable_all 'test*'
 will ask form confirmation to diable tables like:
 mytest1
 test123
 Fortunately, these tables will not be disabled actually since Java pattern 
 doesn't match this way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5312) Closed parent region present in Hlog.lastSeqWritten

2012-02-10 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205596#comment-13205596
 ] 

Jimmy Xiang commented on HBASE-5312:


I checked the lock mechanism and it looks fine.  If it is not a bug in java 
reentrant lock, I suspect the region is removed from the online regions list 
before it is properly closed, either during region spliting, or region closing.

 Closed parent region present in Hlog.lastSeqWritten
 ---

 Key: HBASE-5312
 URL: https://issues.apache.org/jira/browse/HBASE-5312
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.90.7


 This is in reference to the mail sent in the dev mailing list
 Closed parent region present in Hlog.lastSeqWritten.
 The sceanrio described is
 We had a region that was split into two daughters.  When the hlog roll tried 
 to flush the region there was an entry in the HLog.lastSeqWritten that was 
 not flushed or removed from the lastSeqWritten during the parent close.
 Because this flush was not happening subsequent flushes were getting blocked
 {code}
  05:06:44,422 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many
  hlogs: logs=122, maxlogs=32; forcing flush of 1 regions(s):
  2acaf8e3acfd2e8a5825a1f6f0aca4a8
  05:06:44,422 WARN org.apache.hadoop.hbase.regionserver.LogRoller: Failed
  to schedule flush of 2acaf8e3acfd2e8a5825a1f6f0aca4a8r=null,
 requester=null
  05:10:48,666 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many
  hlogs: logs=123, maxlogs=32; forcing flush of 1 regions(s):
  2acaf8e3acfd2e8a5825a1f6f0aca4a8
  05:10:48,666 WARN org.apache.hadoop.hbase.regionserver.LogRoller: Failed
  to schedule flush of 2acaf8e3acfd2e8a5825a1f6f0aca4a8r=null,
 requester=null
  05:14:46,075 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many
  hlogs: logs=124, maxlogs=32; forcing flush of 1 regions(s):
  2acaf8e3acfd2e8a5825a1f6f0aca4a8
  05:14:46,075 WARN org.apache.hadoop.hbase.regionserver.LogRoller: Failed
  to schedule flush of 2acaf8e3acfd2e8a5825a1f6f0aca4a8r=null,
 requester=null
  05:15:41,584 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many
  hlogs: logs=125, maxlogs=32; forcing flush of 1 regions(s):
  2acaf8e3acfd2e8a5825a1f6f0aca4a8
  05:15:41,584 WARN org.apache.hadoop.hbase.regionserver.LogRoller: Failed
  to schedule flush of 2acaf8e3acfd2e8a5825a1f6f0aca4a8r=null,
 {code}
 Lets see what happened for the region 2acaf8e3acfd2e8a5825a1f6f0aca4a8
 {code}
 2012-01-06 00:30:55,214 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Renaming flushed file at 
 hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/.tmp/1755862026714756815
  to 
 hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/973789709483406123
 2012-01-06 00:30:58,946 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Instantiated 
 Htable_UFDR_016,049790700093168-0456520,1325809837958.0ebe5bd7fcbc09ee074d5600b9d4e062.
 2012-01-06 00:30:59,614 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/973789709483406123,
  entries=7537, sequenceid=20312223, memsize=4.2m, filesize=2.9m
 2012-01-06 00:30:59,787 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished snapshotting, commencing flushing stores
 2012-01-06 00:30:59,787 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~133.5m for region 
 Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. in 
 21816ms, sequenceid=20312223, compaction requested=true
 2012-01-06 00:30:59,787 DEBUG 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested 
 for Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. 
 because regionserver20020.cacheFlusher; priority=0, compaction queue size=5840
 {code}
 A user triggered split has been issued to this region which can be seen in 
 the above logs.
 The flushing of this region has resulted in a seq id 20312223.
 The region has been splitted and the parent region has been closed
 {code}
 00:31:12,607 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: 
 Starting split of region 
 Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8.
 00:31:13,694 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing 
 Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8.: 
 disabling compactions  flushes
 00:31:13,694 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates 
 disabled for region 
 Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8.
 00:31:13,718 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed 
 

[jira] [Commented] (HBASE-5376) Add more logging to triage HBASE-5312: Closed parent region present in Hlog.lastSeqWritten

2012-02-10 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205636#comment-13205636
 ] 

Jimmy Xiang commented on HBASE-5376:


I was thinking to use YCSB to load lots of data while set the region size 
small, so that lots of region split will be triggered.  How is that?

 Add more logging to triage HBASE-5312: Closed parent region present in 
 Hlog.lastSeqWritten
 --

 Key: HBASE-5376
 URL: https://issues.apache.org/jira/browse/HBASE-5376
 Project: HBase
  Issue Type: Sub-task
Reporter: Jimmy Xiang
Priority: Minor
 Fix For: 0.90.7


 It is hard to find out what exactly caused HBASE-5312.  Some logging will be 
 helpful to shine some lights.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5327) Print a message when an invalid hbase.rootdir is passed

2012-02-10 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205778#comment-13205778
 ] 

Jimmy Xiang commented on HBASE-5327:


I looked into it. For new Path(path), the path doesn't have to be a complete 
and valid path.  It could be a relative path so it can't be validated.
new Path(parent, child) takes two paths to form a new one (String is converted 
to Path implicitly).  If parent = hdfs://localhost:999
and child = /test, the new path will be hdfs://localhost:999/test and it is 
valid and all are happy.  However is child = test,
in combining there to a URI, the result is hdfs://localhost:999test which is 
invalid.  That's the reason for URISyntaxException.

v2 patch doesn't look good, but I am ok with it.

 Print a message when an invalid hbase.rootdir is passed
 ---

 Key: HBASE-5327
 URL: https://issues.apache.org/jira/browse/HBASE-5327
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: Jean-Daniel Cryans
Assignee: Jimmy Xiang
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: hbase-5327.txt, hbase-5327_v2.txt


 As seen on the mailing list: 
 http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/24124
 If hbase.rootdir doesn't specify a folder on hdfs we crash while opening a 
 path to .oldlogs:
 {noformat}
 2012-02-02 23:07:26,292 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unhandled exception. Starting shutdown.
 java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
 path in absolute URI: hdfs://sv4r11s38:9100.oldlogs
 at org.apache.hadoop.fs.Path.initialize(Path.java:148)
 at org.apache.hadoop.fs.Path.init(Path.java:71)
 at org.apache.hadoop.fs.Path.init(Path.java:50)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:112)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:448)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
 hdfs://sv4r11s38:9100.oldlogs
 at java.net.URI.checkPath(URI.java:1787)
 at java.net.URI.init(URI.java:735)
 at org.apache.hadoop.fs.Path.initialize(Path.java:145)
 ... 6 more
 {noformat}
 It could also crash anywhere else, this just happens to be the first place we 
 use hbase.rootdir. We need to verify that it's an actual folder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5327) Print a message when an invalid hbase.rootdir is passed

2012-02-09 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13204714#comment-13204714
 ] 

Jimmy Xiang commented on HBASE-5327:


This patch fix two issues:
(1) check the root dir to make sure it is valid before generating the old log 
dir.  So that it can give a meaningful error message.
(2) make sure the root dir is a dir instead of a file.  If it is a file, the 
master will hang and try to create the version file forever.

@Jon,  I added some actionable log message.

 Print a message when an invalid hbase.rootdir is passed
 ---

 Key: HBASE-5327
 URL: https://issues.apache.org/jira/browse/HBASE-5327
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: Jean-Daniel Cryans
Assignee: Jimmy Xiang
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: hbase-5327.txt, hbase-5327_v2.txt


 As seen on the mailing list: 
 http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/24124
 If hbase.rootdir doesn't specify a folder on hdfs we crash while opening a 
 path to .oldlogs:
 {noformat}
 2012-02-02 23:07:26,292 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unhandled exception. Starting shutdown.
 java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
 path in absolute URI: hdfs://sv4r11s38:9100.oldlogs
 at org.apache.hadoop.fs.Path.initialize(Path.java:148)
 at org.apache.hadoop.fs.Path.init(Path.java:71)
 at org.apache.hadoop.fs.Path.init(Path.java:50)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:112)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:448)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
 hdfs://sv4r11s38:9100.oldlogs
 at java.net.URI.checkPath(URI.java:1787)
 at java.net.URI.init(URI.java:735)
 at org.apache.hadoop.fs.Path.initialize(Path.java:145)
 ... 6 more
 {noformat}
 It could also crash anywhere else, this just happens to be the first place we 
 use hbase.rootdir. We need to verify that it's an actual folder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5327) Print a message when an invalid hbase.rootdir is passed

2012-02-09 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205064#comment-13205064
 ] 

Jimmy Xiang commented on HBASE-5327:


I prefer the first version actually.  If the root dir is invalid, HDFS will 
throw an IAE. That's how we know a path an invalid HDFS path.

 Print a message when an invalid hbase.rootdir is passed
 ---

 Key: HBASE-5327
 URL: https://issues.apache.org/jira/browse/HBASE-5327
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: Jean-Daniel Cryans
Assignee: Jimmy Xiang
 Fix For: 0.94.0, 0.90.7, 0.92.1

 Attachments: hbase-5327.txt, hbase-5327_v2.txt


 As seen on the mailing list: 
 http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/24124
 If hbase.rootdir doesn't specify a folder on hdfs we crash while opening a 
 path to .oldlogs:
 {noformat}
 2012-02-02 23:07:26,292 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unhandled exception. Starting shutdown.
 java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
 path in absolute URI: hdfs://sv4r11s38:9100.oldlogs
 at org.apache.hadoop.fs.Path.initialize(Path.java:148)
 at org.apache.hadoop.fs.Path.init(Path.java:71)
 at org.apache.hadoop.fs.Path.init(Path.java:50)
 at 
 org.apache.hadoop.hbase.master.MasterFileSystem.init(MasterFileSystem.java:112)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:448)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
 hdfs://sv4r11s38:9100.oldlogs
 at java.net.URI.checkPath(URI.java:1787)
 at java.net.URI.init(URI.java:735)
 at org.apache.hadoop.fs.Path.initialize(Path.java:145)
 ... 6 more
 {noformat}
 It could also crash anywhere else, this just happens to be the first place we 
 use hbase.rootdir. We need to verify that it's an actual folder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5312) Closed parent region present in Hlog.lastSeqWritten

2012-02-09 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205214#comment-13205214
 ] 

Jimmy Xiang commented on HBASE-5312:


Have anyone seen this issue on 0.92 release?  Could we add some logging so that 
we will have some clue when it happens again?

 Closed parent region present in Hlog.lastSeqWritten
 ---

 Key: HBASE-5312
 URL: https://issues.apache.org/jira/browse/HBASE-5312
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.5
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.90.7


 This is in reference to the mail sent in the dev mailing list
 Closed parent region present in Hlog.lastSeqWritten.
 The sceanrio described is
 We had a region that was split into two daughters.  When the hlog roll tried 
 to flush the region there was an entry in the HLog.lastSeqWritten that was 
 not flushed or removed from the lastSeqWritten during the parent close.
 Because this flush was not happening subsequent flushes were getting blocked
 {code}
  05:06:44,422 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many
  hlogs: logs=122, maxlogs=32; forcing flush of 1 regions(s):
  2acaf8e3acfd2e8a5825a1f6f0aca4a8
  05:06:44,422 WARN org.apache.hadoop.hbase.regionserver.LogRoller: Failed
  to schedule flush of 2acaf8e3acfd2e8a5825a1f6f0aca4a8r=null,
 requester=null
  05:10:48,666 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many
  hlogs: logs=123, maxlogs=32; forcing flush of 1 regions(s):
  2acaf8e3acfd2e8a5825a1f6f0aca4a8
  05:10:48,666 WARN org.apache.hadoop.hbase.regionserver.LogRoller: Failed
  to schedule flush of 2acaf8e3acfd2e8a5825a1f6f0aca4a8r=null,
 requester=null
  05:14:46,075 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many
  hlogs: logs=124, maxlogs=32; forcing flush of 1 regions(s):
  2acaf8e3acfd2e8a5825a1f6f0aca4a8
  05:14:46,075 WARN org.apache.hadoop.hbase.regionserver.LogRoller: Failed
  to schedule flush of 2acaf8e3acfd2e8a5825a1f6f0aca4a8r=null,
 requester=null
  05:15:41,584 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Too many
  hlogs: logs=125, maxlogs=32; forcing flush of 1 regions(s):
  2acaf8e3acfd2e8a5825a1f6f0aca4a8
  05:15:41,584 WARN org.apache.hadoop.hbase.regionserver.LogRoller: Failed
  to schedule flush of 2acaf8e3acfd2e8a5825a1f6f0aca4a8r=null,
 {code}
 Lets see what happened for the region 2acaf8e3acfd2e8a5825a1f6f0aca4a8
 {code}
 2012-01-06 00:30:55,214 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Renaming flushed file at 
 hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/.tmp/1755862026714756815
  to 
 hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/973789709483406123
 2012-01-06 00:30:58,946 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Instantiated 
 Htable_UFDR_016,049790700093168-0456520,1325809837958.0ebe5bd7fcbc09ee074d5600b9d4e062.
 2012-01-06 00:30:59,614 INFO org.apache.hadoop.hbase.regionserver.Store: 
 Added 
 hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/973789709483406123,
  entries=7537, sequenceid=20312223, memsize=4.2m, filesize=2.9m
 2012-01-06 00:30:59,787 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished snapshotting, commencing flushing stores
 2012-01-06 00:30:59,787 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
 Finished memstore flush of ~133.5m for region 
 Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. in 
 21816ms, sequenceid=20312223, compaction requested=true
 2012-01-06 00:30:59,787 DEBUG 
 org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested 
 for Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. 
 because regionserver20020.cacheFlusher; priority=0, compaction queue size=5840
 {code}
 A user triggered split has been issued to this region which can be seen in 
 the above logs.
 The flushing of this region has resulted in a seq id 20312223.
 The region has been splitted and the parent region has been closed
 {code}
 00:31:12,607 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: 
 Starting split of region 
 Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8.
 00:31:13,694 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing 
 Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8.: 
 disabling compactions  flushes
 00:31:13,694 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates 
 disabled for region 
 Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8.
 00:31:13,718 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed 
 Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8.
 00:31:39,552 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Offlined 

[jira] [Commented] (HBASE-5221) bin/hbase script doesn't look for Hadoop jars in the right place in trunk layout

2012-02-08 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203675#comment-13203675
 ] 

Jimmy Xiang commented on HBASE-5221:


Since 0.23, Hadoop re-organized the folder structure. They put the jars under 
each individual modules like hdfs, mapreduce, util and so on (under 
share/hadoop).  The common one is under share/haddop/common. 

I am not very clear about the story behind either.  Todd should know this much 
better.

 bin/hbase script doesn't look for Hadoop jars in the right place in trunk 
 layout
 

 Key: HBASE-5221
 URL: https://issues.apache.org/jira/browse/HBASE-5221
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Attachments: hbase-5221.txt


 Running against an 0.24.0-SNAPSHOT hadoop:
 ls: cannot access 
 /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-common*.jar: No such file or 
 directory
 ls: cannot access /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-hdfs*.jar: 
 No such file or directory
 ls: cannot access 
 /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-mapred*.jar: No such file or 
 directory
 The jars are rooted deeper in the heirarchy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5221) bin/hbase script doesn't look for Hadoop jars in the right place in trunk layout

2012-02-08 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203807#comment-13203807
 ] 

Jimmy Xiang commented on HBASE-5221:


The problem is that when I run hbase shell, it complains those files are 
missing, and ClassNotFound org.apache.hadoop.util.PlatformName.

We need to fix it. 

The script is already looking under HADOOP installation tree, just a wrong 
place.

I don't think this fix will break anything.  We can use this fix before 
HBASE-5286 is resolved.

 bin/hbase script doesn't look for Hadoop jars in the right place in trunk 
 layout
 

 Key: HBASE-5221
 URL: https://issues.apache.org/jira/browse/HBASE-5221
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Fix For: 0.94.0

 Attachments: hbase-5221.txt


 Running against an 0.24.0-SNAPSHOT hadoop:
 ls: cannot access 
 /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-common*.jar: No such file or 
 directory
 ls: cannot access /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-hdfs*.jar: 
 No such file or directory
 ls: cannot access 
 /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-mapred*.jar: No such file or 
 directory
 The jars are rooted deeper in the heirarchy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5221) bin/hbase script doesn't look for Hadoop jars in the right place in trunk layout

2012-02-08 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203810#comment-13203810
 ] 

Jimmy Xiang commented on HBASE-5221:


Ok, let me close it as dup.  Probably, I just use the fix for myself before 
HBASE-5286 is resolved.

 bin/hbase script doesn't look for Hadoop jars in the right place in trunk 
 layout
 

 Key: HBASE-5221
 URL: https://issues.apache.org/jira/browse/HBASE-5221
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: Todd Lipcon
Assignee: Jimmy Xiang
 Fix For: 0.94.0

 Attachments: hbase-5221.txt


 Running against an 0.24.0-SNAPSHOT hadoop:
 ls: cannot access 
 /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-common*.jar: No such file or 
 directory
 ls: cannot access /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-hdfs*.jar: 
 No such file or directory
 ls: cannot access 
 /home/todd/ha-demo/hadoop-0.24.0-SNAPSHOT/hadoop-mapred*.jar: No such file or 
 directory
 The jars are rooted deeper in the heirarchy.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5353) HA/Distributed HMaster via RegionServers

2012-02-08 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203909#comment-13203909
 ] 

Jimmy Xiang commented on HBASE-5353:


Another option is not to have a master, every region server can do the work a 
master currently does.  Just uses the ZK to coordinate them.
For example, once a region server dies, all other region server knows about it, 
all try to run the dead server clean up, but only one will actually do it.  The 
drawback here is too much zk interaction.

 HA/Distributed HMaster via RegionServers
 

 Key: HBASE-5353
 URL: https://issues.apache.org/jira/browse/HBASE-5353
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.94.0
Reporter: Jesse Yates
Priority: Minor

 Currently, the HMaster node must be considered a 'special' node (single point 
 of failure), meaning that the node must be protected more than the other 
 commodity machines. It should be possible to instead have the HMaster be much 
 more available, either in a distributed sense (meaning a bit rewrite) or with 
 multiple instances and automatic failover. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5317) Fix TestHFileOutputFormat to work against hadoop 0.23

2012-02-04 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200575#comment-13200575
 ] 

Jimmy Xiang commented on HBASE-5317:


@Ted, did you run it with Hadoop 0.23?

 Fix TestHFileOutputFormat to work against hadoop 0.23
 -

 Key: HBASE-5317
 URL: https://issues.apache.org/jira/browse/HBASE-5317
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.0, 0.92.0
Reporter: Gregory Chanan
Assignee: Gregory Chanan
 Attachments: HBASE-5317-v0.patch


 Running
 mvn -Dhadoop.profile=23 test -P localTests 
 -Dtest=org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
 yields this on 0.92:
 Failed tests:   
 testColumnFamilyCompression(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  HFile for column family info-A not found
 Tests in error: 
   test_TIMERANGE(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat): 
 /home/gchanan/workspace/apache92/target/test-data/276cbd0c-c771-4f81-9ba8-c464c9dd7486/test_TIMERANGE_present/_temporary/0/_temporary/_attempt_200707121733_0001_m_00_0
  (Is a directory)
   
 testMRIncrementalLoad(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
   
 testMRIncrementalLoadWithSplit(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 It looks like on trunk, this also results in an error:
   
 testExcludeMinorCompaction(org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat):
  TestTable
 I have a patch that fixes testColumnFamilyCompression and test_TIMERANGE, but 
 haven't fixed the other 3 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5310) HConnectionManager server cache key enhancement

2012-02-01 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197984#comment-13197984
 ] 

Jimmy Xiang commented on HBASE-5310:


@Ted, thanks for review and integration.

 HConnectionManager server cache key enhancement
 ---

 Key: HBASE-5310
 URL: https://issues.apache.org/jira/browse/HBASE-5310
 Project: HBase
  Issue Type: Improvement
  Components: client
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 0.94.0

 Attachments: hbase-5310.txt


 HConnectionManager uses deprecated HServerAddress to create server cache key 
 which needs to resolve the address every time.
 It should be better to use HRegionLocation.getHostnamePort() instead.
 In our cluster we have some DNS issue, resolving an address fails sometime 
 which kills the application since it is a runtime
 exception IllegalArgumentException thrown at 
 HServerAddress.getResolvedAddress.  This change will fix this issue as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5281) Should a failure in creating an unassigned node abort the master?

2012-01-31 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197440#comment-13197440
 ] 

Jimmy Xiang commented on HBASE-5281:


I think it is safer to retry certain times before abort.

 Should a failure in creating an unassigned node abort the master?
 -

 Key: HBASE-5281
 URL: https://issues.apache.org/jira/browse/HBASE-5281
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.5
Reporter: Harsh J
Assignee: Harsh J
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5281.patch


 In {{AssignmentManager}}'s {{CreateUnassignedAsyncCallback}}, we have the 
 following condition:
 {code}
 if (rc != 0) {
 // Thisis resultcode.  If non-zero, need to resubmit.
 LOG.warn(rc != 0 for  + path +  -- retryable connectionloss --  +
   FIX see http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A2;);
 this.zkw.abort(Connectionloss writing unassigned at  + path +
   , rc= + rc, null);
 return;
 }
 {code}
 While a similar structure inside {{ExistsUnassignedAsyncCallback}} (which the 
 above is linked to), does not have such a force abort.
 Do we really require the abort statement here, or can we make do without?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5281) Should a failure in creating an unassigned node abort the master?

2012-01-31 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197472#comment-13197472
 ] 

Jimmy Xiang commented on HBASE-5281:


The issue Harsh reported is from a customer using CDH3u2 which doesn't have the 
recoverablezk feature.

I think the recoverablezk feature in 0.92.0 should have fixed this issue.

 Should a failure in creating an unassigned node abort the master?
 -

 Key: HBASE-5281
 URL: https://issues.apache.org/jira/browse/HBASE-5281
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.5
Reporter: Harsh J
Assignee: Harsh J
 Fix For: 0.94.0, 0.92.1

 Attachments: HBASE-5281.patch


 In {{AssignmentManager}}'s {{CreateUnassignedAsyncCallback}}, we have the 
 following condition:
 {code}
 if (rc != 0) {
 // Thisis resultcode.  If non-zero, need to resubmit.
 LOG.warn(rc != 0 for  + path +  -- retryable connectionloss --  +
   FIX see http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A2;);
 this.zkw.abort(Connectionloss writing unassigned at  + path +
   , rc= + rc, null);
 return;
 }
 {code}
 While a similar structure inside {{ExistsUnassignedAsyncCallback}} (which the 
 above is linked to), does not have such a force abort.
 Do we really require the abort statement here, or can we make do without?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5210) HFiles are missing from an incremental load

2012-01-23 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191310#comment-13191310
 ] 

Jimmy Xiang commented on HBASE-5210:


Any fix in getRandomFilename will just reduce the chance of file name 
collision.  Since this a rare case, I think it may be better to just fail the 
task if failed to commit the files in the moveTaskOutputs(), without 
overwriting the existing files.  In HDFS 0.23, rename() takes an option not to 
overwrite.  With HADOOP 0.20, we can just do our best to check any conflicts 
before committing the files.

 HFiles are missing from an incremental load
 ---

 Key: HBASE-5210
 URL: https://issues.apache.org/jira/browse/HBASE-5210
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.2
 Environment: HBase 0.90.2 with Hadoop-0.20.2 (with durable sync).  
 RHEL 2.6.18-164.15.1.el5.  4 node cluster (1 master, 3 slaves)
Reporter: Lawrence Simpson
 Attachments: HBASE-5210-crazy-new-getRandomFilename.patch


 We run an overnight map/reduce job that loads data from an external source 
 and adds that data to an existing HBase table.  The input files have been 
 loaded into hdfs.  The map/reduce job uses the HFileOutputFormat (and the 
 TotalOrderPartitioner) to create HFiles which are subsequently added to the 
 HBase table.  On at least two separate occasions (that we know of), a range 
 of output would be missing for a given day.  The range of keys for the 
 missing values corresponded to those of a particular region.  This implied 
 that a complete HFile somehow went missing from the job.  Further 
 investigation revealed the following:
  * Two different reducers (running in separate JVMs and thus separate class 
 loaders)
  * in the same server can end up using the same file names for their
  * HFiles.  The scenario is as follows:
  *1.  Both reducers start near the same time.
  *2.  The first reducer reaches the point where it wants to write its 
 first file.
  *3.  It uses the StoreFile class which contains a static Random 
 object 
  *which is initialized by default using a timestamp.
  *4.  The file name is generated using the random number generator.
  *5.  The file name is checked against other existing files.
  *6.  The file is written into temporary files in a directory named
  *after the reducer attempt.
  *7.  The second reduce task reaches the same point, but its 
 StoreClass
  *(which is now in the file system's cache) gets loaded within the
  *time resolution of the OS and thus initializes its Random()
  *object with the same seed as the first task.
  *8.  The second task also checks for an existing file with the name
  *generated by the random number generator and finds no conflict
  *because each task is writing files in its own temporary folder.
  *9.  The first task finishes and gets its temporary files committed
  *to the real folder specified for output of the HFiles.
  * 10.The second task then reaches its own conclusion and commits its
  *files (moveTaskOutputs).  The released Hadoop code just 
 overwrites
  *any files with the same name.  No warning messages or anything.
  *The first task's HFiles just go missing.
  * 
  *  Note:  The reducers here are NOT different attempts at the same 
  *reduce task.  They are different reduce tasks so data is
  *really lost.
 I am currently testing a fix in which I have added code to the Hadoop 
 FileOutputCommitter.moveTaskOutputs method to check for a conflict with
 an existing file in the final output folder and to rename the HFile if
 needed.  This may not be appropriate for all uses of FileOutputFormat.
 So I have put this into a new class which is then used by a subclass of
 HFileOutputFormat.  Subclassing of FileOutputCommitter itself was a bit 
 more of a problem due to private declarations.
 I don't know if my approach is the best fix for the problem.  If someone
 more knowledgeable than myself deems that it is, I will be happy to share
 what I have done and by that time I may have some information on the
 results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5210) HFiles are missing from an incremental load

2012-01-23 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13191351#comment-13191351
 ] 

Jimmy Xiang commented on HBASE-5210:


I like this one.  It's really simple and clean.

 HFiles are missing from an incremental load
 ---

 Key: HBASE-5210
 URL: https://issues.apache.org/jira/browse/HBASE-5210
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.2
 Environment: HBase 0.90.2 with Hadoop-0.20.2 (with durable sync).  
 RHEL 2.6.18-164.15.1.el5.  4 node cluster (1 master, 3 slaves)
Reporter: Lawrence Simpson
 Attachments: HBASE-5210-crazy-new-getRandomFilename.patch


 We run an overnight map/reduce job that loads data from an external source 
 and adds that data to an existing HBase table.  The input files have been 
 loaded into hdfs.  The map/reduce job uses the HFileOutputFormat (and the 
 TotalOrderPartitioner) to create HFiles which are subsequently added to the 
 HBase table.  On at least two separate occasions (that we know of), a range 
 of output would be missing for a given day.  The range of keys for the 
 missing values corresponded to those of a particular region.  This implied 
 that a complete HFile somehow went missing from the job.  Further 
 investigation revealed the following:
  * Two different reducers (running in separate JVMs and thus separate class 
 loaders)
  * in the same server can end up using the same file names for their
  * HFiles.  The scenario is as follows:
  *1.  Both reducers start near the same time.
  *2.  The first reducer reaches the point where it wants to write its 
 first file.
  *3.  It uses the StoreFile class which contains a static Random 
 object 
  *which is initialized by default using a timestamp.
  *4.  The file name is generated using the random number generator.
  *5.  The file name is checked against other existing files.
  *6.  The file is written into temporary files in a directory named
  *after the reducer attempt.
  *7.  The second reduce task reaches the same point, but its 
 StoreClass
  *(which is now in the file system's cache) gets loaded within the
  *time resolution of the OS and thus initializes its Random()
  *object with the same seed as the first task.
  *8.  The second task also checks for an existing file with the name
  *generated by the random number generator and finds no conflict
  *because each task is writing files in its own temporary folder.
  *9.  The first task finishes and gets its temporary files committed
  *to the real folder specified for output of the HFiles.
  * 10.The second task then reaches its own conclusion and commits its
  *files (moveTaskOutputs).  The released Hadoop code just 
 overwrites
  *any files with the same name.  No warning messages or anything.
  *The first task's HFiles just go missing.
  * 
  *  Note:  The reducers here are NOT different attempts at the same 
  *reduce task.  They are different reduce tasks so data is
  *really lost.
 I am currently testing a fix in which I have added code to the Hadoop 
 FileOutputCommitter.moveTaskOutputs method to check for a conflict with
 an existing file in the final output folder and to rename the HFile if
 needed.  This may not be appropriate for all uses of FileOutputFormat.
 So I have put this into a new class which is then used by a subclass of
 HFileOutputFormat.  Subclassing of FileOutputCommitter itself was a bit 
 more of a problem due to private declarations.
 I don't know if my approach is the best fix for the problem.  If someone
 more knowledgeable than myself deems that it is, I will be happy to share
 what I have done and by that time I may have some information on the
 results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5196) Failure in region split after PONR could cause region hole

2012-01-17 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187816#comment-13187816
 ] 

Jimmy Xiang commented on HBASE-5196:


Yes, the test suite on 0.90 with the patch passed.

 Failure in region split after PONR could cause region hole
 --

 Key: HBASE-5196
 URL: https://issues.apache.org/jira/browse/HBASE-5196
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5196-v2.txt, hbase-5196_0.90.txt


 If region split fails after PONR, it relies on the master ServerShutdown 
 handler to fix it.  However, if the master doesn't get a chance to fix it.  
 There will be a hole in the region chain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5196) Failure in region split after PONR could cause region hole

2012-01-16 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187318#comment-13187318
 ] 

Jimmy Xiang commented on HBASE-5196:


I attached a patch for 0.90 branch: hbase-5196_0.90.txt

Could anyone please check it in?

 Failure in region split after PONR could cause region hole
 --

 Key: HBASE-5196
 URL: https://issues.apache.org/jira/browse/HBASE-5196
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5196-v2.txt, hbase-5196_0.90.txt


 If region split fails after PONR, it relies on the master ServerShutdown 
 handler to fix it.  However, if the master doesn't get a chance to fix it.  
 There will be a hole in the region chain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5196) Failure in region split after PONR could cause region hole

2012-01-16 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187332#comment-13187332
 ] 

Jimmy Xiang commented on HBASE-5196:


@Ted, I ran the test suite, and verified the fix on CDH3u3.
Let me run the test suite on 0.90 now. 


 Failure in region split after PONR could cause region hole
 --

 Key: HBASE-5196
 URL: https://issues.apache.org/jira/browse/HBASE-5196
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5196-v2.txt, hbase-5196_0.90.txt


 If region split fails after PONR, it relies on the master ServerShutdown 
 handler to fix it.  However, if the master doesn't get a chance to fix it.  
 There will be a hole in the region chain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5136) Redundant MonitoredTask instances in case of distributed log splitting retry

2012-01-13 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185688#comment-13185688
 ] 

Jimmy Xiang commented on HBASE-5136:


Instead of reuse the same status object, can we abort the original one?

{code}
waitForSplittingCompletion(batch, status);
if (batch.done != batch.installed) {
  batch.isDead = true;
  tot_mgr_log_split_batch_err.incrementAndGet();
  LOG.warn(error while splitting logs in  + logDirs +
   installed =  + batch.installed +  but only  + batch.done +  done);
= update the status message and abort it here
  throw new IOException(error or interrupt while splitting logs in 
  + logDirs +  Task =  + batch);
}
{code}

 Redundant MonitoredTask instances in case of distributed log splitting retry
 

 Key: HBASE-5136
 URL: https://issues.apache.org/jira/browse/HBASE-5136
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
Assignee: Zhihong Yu
 Attachments: 5136.txt


 In case of log splitting retry, the following code would be executed multiple 
 times:
 {code}
   public long splitLogDistributed(final ListPath logDirs) throws 
 IOException {
 MonitoredTask status = TaskMonitor.get().createStatus(
   Doing distributed log split in  + logDirs);
 {code}
 leading to multiple MonitoredTask instances.
 User may get confused by multiple distributed log splitting entries for the 
 same region server on master UI

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5174) Coalesce aborted tasks in the TaskMonitor

2012-01-13 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185694#comment-13185694
 ] 

Jimmy Xiang commented on HBASE-5174:


Failed or aborted tasks should not be displayed after the retry is succeeded. 
Otherwise, will it cause confusion?

 Coalesce aborted tasks in the TaskMonitor
 -

 Key: HBASE-5174
 URL: https://issues.apache.org/jira/browse/HBASE-5174
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
 Fix For: 0.94.0, 0.92.1


 Some tasks can get repeatedly canceled like flushing when splitting is going 
 on, in the logs it looks like this:
 {noformat}
 2012-01-10 19:28:29,164 INFO 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
 pressure
 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
 writesEnabled=false
 2012-01-10 19:28:29,164 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=1.6g
 2012-01-10 19:28:29,164 INFO 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
 pressure
 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
 writesEnabled=false
 2012-01-10 19:28:29,164 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=1.6g
 2012-01-10 19:28:29,164 INFO 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
 pressure
 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
 writesEnabled=false
 {noformat}
 But in the TaskMonitor UI you'll get MAX_TASKS (1000) displayed on top of the 
 regions. Basically 1000x:
 {noformat}
 Tue Jan 10 19:28:29 UTC 2012  Flushing 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. ABORTED (since 31sec 
 ago)   Not flushing since writes not enabled (since 31sec ago)
 {noformat}
 It's ugly and I'm sure some users will freak out seeing this, plus you have 
 to scroll down all the way to see your regions. Coalescing consecutive 
 aborted tasks seems like a good solution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5174) Coalesce aborted tasks in the TaskMonitor

2012-01-13 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185701#comment-13185701
 ] 

Jimmy Xiang commented on HBASE-5174:


I meant we can not just show the failed or aborted tasks longer.  We should 
also show the succeeded one or the retrying one as well, if it failed before 
and the failed tasks is still showing.

 Coalesce aborted tasks in the TaskMonitor
 -

 Key: HBASE-5174
 URL: https://issues.apache.org/jira/browse/HBASE-5174
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Jean-Daniel Cryans
 Fix For: 0.94.0, 0.92.1


 Some tasks can get repeatedly canceled like flushing when splitting is going 
 on, in the logs it looks like this:
 {noformat}
 2012-01-10 19:28:29,164 INFO 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
 pressure
 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
 writesEnabled=false
 2012-01-10 19:28:29,164 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=1.6g
 2012-01-10 19:28:29,164 INFO 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
 pressure
 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
 writesEnabled=false
 2012-01-10 19:28:29,164 DEBUG 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
 because memory above low water=1.6g
 2012-01-10 19:28:29,164 INFO 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap 
 pressure
 2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
 NOT flushing memstore for region 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, 
 writesEnabled=false
 {noformat}
 But in the TaskMonitor UI you'll get MAX_TASKS (1000) displayed on top of the 
 regions. Basically 1000x:
 {noformat}
 Tue Jan 10 19:28:29 UTC 2012  Flushing 
 test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. ABORTED (since 31sec 
 ago)   Not flushing since writes not enabled (since 31sec ago)
 {noformat}
 It's ugly and I'm sure some users will freak out seeing this, plus you have 
 to scroll down all the way to see your regions. Coalescing consecutive 
 aborted tasks seems like a good solution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5196) Failure in region split after PONR could cause region hole

2012-01-13 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185760#comment-13185760
 ] 

Jimmy Xiang commented on HBASE-5196:


I have a simple fix. When the master starts up, fix up all the missing 
daughters as the ServerShutdown handler does.

 Failure in region split after PONR could cause region hole
 --

 Key: HBASE-5196
 URL: https://issues.apache.org/jira/browse/HBASE-5196
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang

 If region split fails after PONR, it relies on the master ServerShutdown 
 handler to fix it.  However, if the master doesn't get a chance to fix it.  
 There will be a hole in the region chain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5196) Failure in region split after PONR could cause region hole

2012-01-13 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185856#comment-13185856
 ] 

Jimmy Xiang commented on HBASE-5196:


Yes, it is good.  Thanks Ted.

These failed tests passed on my box.

 Failure in region split after PONR could cause region hole
 --

 Key: HBASE-5196
 URL: https://issues.apache.org/jira/browse/HBASE-5196
 Project: HBase
  Issue Type: Bug
  Components: master, regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5196-v2.txt


 If region split fails after PONR, it relies on the master ServerShutdown 
 handler to fix it.  However, if the master doesn't get a chance to fix it.  
 There will be a hole in the region chain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5150) Fail in a thread may not fail a test, clean up log splitting test

2012-01-11 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13184221#comment-13184221
 ] 

Jimmy Xiang commented on HBASE-5150:


Those failed tests passed on my local box.

 Fail in a thread may not fail a test, clean up log splitting test
 -

 Key: HBASE-5150
 URL: https://issues.apache.org/jira/browse/HBASE-5150
 Project: HBase
  Issue Type: Test
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: hbase-5150.txt, hbase_5150_v3.patch


 This is to clean up some tests for HBASE-5081.  The Assert.fail method in a 
 separate thread will terminate the thread, but may not fail the test.
 We can use callable, so that we can get the error in getting the result. 
 Some documentation to explain the test will be helpful too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5150) Fail in a thread may not fail a test, clean up log splitting test

2012-01-11 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13184234#comment-13184234
 ] 

Jimmy Xiang commented on HBASE-5150:


@Prakash and Ted, are you ok with this patch? I changed the 3sec wait time to 
2sec.

 Fail in a thread may not fail a test, clean up log splitting test
 -

 Key: HBASE-5150
 URL: https://issues.apache.org/jira/browse/HBASE-5150
 Project: HBase
  Issue Type: Test
Affects Versions: 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: hbase-5150.txt, hbase_5150_v3.patch


 This is to clean up some tests for HBASE-5081.  The Assert.fail method in a 
 separate thread will terminate the thread, but may not fail the test.
 We can use callable, so that we can get the error in getting the result. 
 Some documentation to explain the test will be helpful too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-06 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181407#comment-13181407
 ] 

Jimmy Xiang commented on HBASE-5081:


It turns out all my region servers died.  I restarted them (rs) all, and things 
are looking better now.  One folder is completed. Two more to go.

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 5081-deleteNode-with-while-loop.txt, 
 HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, 
 distributed_log_splitting_screen_shot2.png, 
 distributed_log_splitting_screenshot3.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-06 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181451#comment-13181451
 ] 

Jimmy Xiang commented on HBASE-5081:


Now, all logs are split. I am happy with the patch.

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 5081-deleteNode-with-while-loop.txt, 
 HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, 
 distributed_log_splitting_screen_shot2.png, 
 distributed_log_splitting_screenshot3.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-06 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181457#comment-13181457
 ] 

Jimmy Xiang commented on HBASE-5081:


@Stack, yes, it will screw up the cluster (7 nodes).

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 5081-deleteNode-with-while-loop.txt, 
 HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, 
 distributed_log_splitting_screen_shot2.png, 
 distributed_log_splitting_screenshot3.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races against splitLog retry

2012-01-05 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181087#comment-13181087
 ] 

Jimmy Xiang commented on HBASE-5081:


It hangs again.  In the region server log, I saw some DFS issue.  Let me 
restart the cluster.  Hopefully, it will move on.

 Distributed log splitting deleteNode races against splitLog retry 
 --

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Prakash Khemani
 Fix For: 0.92.0

 Attachments: 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 0001-HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 5081-deleteNode-with-while-loop.txt, 
 HBASE-5081-jira-Distributed-log-splitting-deleteNode.patch, 
 distributed-log-splitting-screenshot.png, 
 distributed_log_splitting_screen_shot2.png, hbase-5081-patch-v6.txt, 
 hbase-5081-patch-v7.txt, hbase-5081_patch_for_92_v4.txt, 
 hbase-5081_patch_v5.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on

2011-12-30 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177847#comment-13177847
 ] 

Jimmy Xiang commented on HBASE-5099:


TestReplication is flaky.  But it works on my ubuntu box.
Let me take a look.

 ZK event thread waiting for root region assignment may block server shutdown 
 handler for the region sever the root region was on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099-v6.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on

2011-12-30 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177860#comment-13177860
 ] 

Jimmy Xiang commented on HBASE-5099:


I tried to debug this testcase but it doesn't stop at the changes I did.

 ZK event thread waiting for root region assignment may block server shutdown 
 handler for the region sever the root region was on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099-v6.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region assignment may block server shutdown handler for the region sever the root region was on

2011-12-30 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177886#comment-13177886
 ] 

Jimmy Xiang commented on HBASE-5099:


TestReplication#queueFailover has a bug that's why it is flaky:

https://issues.apache.org/jira/browse/HBASE-5112

 ZK event thread waiting for root region assignment may block server shutdown 
 handler for the region sever the root region was on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099-v6.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5112) TestReplication#queueFailover flaky due to code error

2011-12-30 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177887#comment-13177887
 ] 

Jimmy Xiang commented on HBASE-5112:


@Ted, could you please give this patch a try on your MacBook?  I could not 
reproduce the failure on my box.
I looked into the code carefully and this fix should make this testcase not 
flaky any more.

 TestReplication#queueFailover flaky due to code error
 -

 Key: HBASE-5112
 URL: https://issues.apache.org/jira/browse/HBASE-5112
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: hbase-5112.patch


 In TestReplication#queueFailover, the second scan is not reset for each new 
 scan.  Followed scan may not be able to scan the whole table.
 So it cannot get all the data and the test fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177417#comment-13177417
 ] 

Jimmy Xiang commented on HBASE-5099:


@Ted, thanks!

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-29 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13177446#comment-13177446
 ] 

Jimmy Xiang commented on HBASE-5099:


There is no harm to call shutdownNow() even if awaitTerminition is not timed 
out.

So this should be fine:

if (executor.awaitTermination(timeout, TimeUnit.MILLISECONDS)
 result.isDone()) {
  Boolean recovered = result.get();
  if (recovered != null) {
return recovered.booleanValue();
  }
}
executor.shutdownNow();


 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0, 0.94.0

 Attachments: 5099.92, ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099-v2.patch, 
 hbase-5099-v3.patch, hbase-5099-v4.patch, hbase-5099-v5.patch, 
 hbase-5099-v6.patch, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-28 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176778#comment-13176778
 ] 

Jimmy Xiang commented on HBASE-5099:


Cool, let me submit a patch.

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png, hbase-5099.patch


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-27 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176430#comment-13176430
 ] 

Jimmy Xiang commented on HBASE-5099:


This is good.  If introducing a timeout, I prefer to do it for 
tryRecoveringExpiredZKSession().  
The reason for that is, other than waitForAssignment, there are several other 
places which have the waiting logic as well,
such as bulkAssign(), waitForRoot(), 
this.activeMasterManager.blockUntilBecomingActiveMaster(startupStatus), etc.

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5099) ZK event thread waiting for root region while server shutdown handler waiting for event thread to finish distributed log splitting to recover the region sever the root

2011-12-27 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176447#comment-13176447
 ] 

Jimmy Xiang commented on HBASE-5099:


tryRecoveringExpiredZKSession() is only called by abortNow(), which is called 
by abort(), which is called by the
eventThread.  I was thinking to put this whole method in another thread with 
executor service and time it out after a certain time, for example, 5 minutes, 
then fails the recovery and let it abort.

This way, we don't have to adding timeout for all the methods.  The regular 
master startup is not impacted which calls assignRootAndMeta() too.

However, if we know most likely just waitForAssignment() takes a long time, we 
can add timeout to this method only. But I am not so sure.

 ZK event thread waiting for root region while server shutdown handler waiting 
 for event thread to finish distributed log splitting to recover the region 
 sever the root region is on
 

 Key: HBASE-5099
 URL: https://issues.apache.org/jira/browse/HBASE-5099
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
 Attachments: ZK-event-thread-waiting-for-root.png, 
 distributed-log-splitting-hangs.png


 A RS died.  The ServerShutdownHandler kicked in and started the logspliting.  
 SpliLogManager
 installed the tasks asynchronously, then started to wait for them to complete.
 The task znodes were not created actually.  The requests were just queued.
 At this time, the zookeeper connection expired.  HMaster tried to recover the 
 expired ZK session.
 During the recovery, a new zookeeper connection was created.  However, this 
 master became the
 new master again.  It tried to assign root and meta.
 Because the dead RS got the old root region, the master needs to wait for the 
 log splitting to complete.
 This waiting holds the zookeeper event thread.  So the async create split 
 task is never retried since
 there is only one event thread, which is waiting for the root region assigned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-22 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175155#comment-13175155
 ] 

Jimmy Xiang commented on HBASE-5081:


@Stack, it is not an orphan task.  It happens in ServerShutdownHandler.  It 
retries the log splitting if the previous one failed for any reason:

line 178:
this.services.getExecutorService().submit(this);

It keep retrying.  Should we have a limit here?

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0

 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081-patch-v6.txt, hbase-5081-patch-v7.txt, 
 hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, 
 patch_for_92_v2.txt, patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-22 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175158#comment-13175158
 ] 

Jimmy Xiang commented on HBASE-5081:


@Prakash, this one didn't happen when the master starts up.  It happened when 
one region server died.


 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.92.0

 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081-patch-v6.txt, hbase-5081-patch-v7.txt, 
 hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, 
 patch_for_92_v2.txt, patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174180#comment-13174180
 ] 

Jimmy Xiang commented on HBASE-5081:


I am working on a path now. I think synchronous deleteNode is clean.  It will 
give retry a fresh start.
But it may take a while if there are too many files.  Yes, for long term, we 
can think about how to do what stack says.

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174187#comment-13174187
 ] 

Jimmy Xiang commented on HBASE-5081:


Can we deleteNode only if it is successfully done?  If it is not completed, let 
the node stay there.  In this case, when the retry happens, it should see the 
old node there, but it is ok.  The new task in the hashmap won't be deleted 
either.

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174251#comment-13174251
 ] 

Jimmy Xiang commented on HBASE-5081:


The patch is for both 0.92 and 0.94 actually.

 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081_patch_for_92_v4.txt, patch_for_92.txt, patch_for_92_v2.txt, 
 patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5081) Distributed log splitting deleteNode races againsth splitLog retry

2011-12-21 Thread Jimmy Xiang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174357#comment-13174357
 ] 

Jimmy Xiang commented on HBASE-5081:


I am thinking about sync delete for the failure case.  What do you think?

I am adjusting the test failure now.



 Distributed log splitting deleteNode races againsth splitLog retry 
 ---

 Key: HBASE-5081
 URL: https://issues.apache.org/jira/browse/HBASE-5081
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.92.0, 0.94.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: distributed-log-splitting-screenshot.png, 
 hbase-5081_patch_for_92_v4.txt, hbase-5081_patch_v5.txt, patch_for_92.txt, 
 patch_for_92_v2.txt, patch_for_92_v3.txt


 Recently, during 0.92 rc testing, we found distributed log splitting hangs 
 there forever.  Please see attached screen shot.
 I looked into it and here is what happened I think:
 1. One rs died, the servershutdownhandler found it out and started the 
 distributed log splitting;
 2. All three tasks failed, so the three tasks were deleted, asynchronously;
 3. Servershutdownhandler retried the log splitting;
 4. During the retrial, it created these three tasks again, and put them in a 
 hashmap (tasks);
 5. The asynchronously deletion in step 2 finally happened for one task, in 
 the callback, it removed one
 task in the hashmap;
 6. One of the newly submitted tasks' zookeeper watcher found out that task is 
 unassigned, and it is not
 in the hashmap, so it created a new orphan task.
 7.  All three tasks failed, but that task created in step 6 is an orphan so 
 the batch.err counter was one short,
 so the log splitting hangs there and keeps waiting for the last task to 
 finish which is never going to happen.
 So I think the problem is step 2.  The fix is to make deletion sync, instead 
 of async, so that the retry will have
 a clean start.
 Async deleteNode will mess up with split log retrial.  In extreme situation, 
 if async deleteNode doesn't happen
 soon enough, some node created during the retrial could be deleted.
 deleteNode should be sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >