[jira] [Updated] (HBASE-9390) coprocessors observers are not called during a recovery with the new log replay algorithm

2013-09-22 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9390:
-

Attachment: (was: hbase-9390-part2.patch)

> coprocessors observers are not called during a recovery with the new log 
> replay algorithm
> -
>
> Key: HBASE-9390
> URL: https://issues.apache.org/jira/browse/HBASE-9390
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, MTTR
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Jeffrey Zhong
> Attachments: copro.patch, hbase-9390.patch, hbase-9390-v2.patch
>
>
> See the patch to reproduce the issue: If we activate log replay we don't have 
> the events on WAL restore.
> Pinging [~jeffreyz], we discussed this offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9390) coprocessors observers are not called during a recovery with the new log replay algorithm

2013-09-22 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9390:
-

Attachment: hbase-9390-part2.patch

> coprocessors observers are not called during a recovery with the new log 
> replay algorithm
> -
>
> Key: HBASE-9390
> URL: https://issues.apache.org/jira/browse/HBASE-9390
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, MTTR
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Jeffrey Zhong
> Attachments: copro.patch, hbase-9390-part2.patch, hbase-9390.patch, 
> hbase-9390-v2.patch
>
>
> See the patch to reproduce the issue: If we activate log replay we don't have 
> the events on WAL restore.
> Pinging [~jeffreyz], we discussed this offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9390) coprocessors observers are not called during a recovery with the new log replay algorithm

2013-09-23 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9390:
-

Attachment: hbase-9390-part2-v2.patch

Thanks [~nkeywal] and [~te...@apache.org] reviews! I added a test case and 
incorporate feedbacks from Ted.


> coprocessors observers are not called during a recovery with the new log 
> replay algorithm
> -
>
> Key: HBASE-9390
> URL: https://issues.apache.org/jira/browse/HBASE-9390
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, MTTR
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Jeffrey Zhong
> Attachments: copro.patch, hbase-9390-part2.patch, 
> hbase-9390-part2-v2.patch, hbase-9390.patch, hbase-9390-v2.patch
>
>
> See the patch to reproduce the issue: If we activate log replay we don't have 
> the events on WAL restore.
> Pinging [~jeffreyz], we discussed this offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9390) coprocessors observers are not called during a recovery with the new log replay algorithm

2013-09-23 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775839#comment-13775839
 ] 

Jeffrey Zhong commented on HBASE-9390:
--

The QA run on v2 patch is clean. I'll commit the v2 patch tomorrow evening if 
there is no objections. Thanks.

> coprocessors observers are not called during a recovery with the new log 
> replay algorithm
> -
>
> Key: HBASE-9390
> URL: https://issues.apache.org/jira/browse/HBASE-9390
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, MTTR
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Jeffrey Zhong
> Attachments: copro.patch, hbase-9390-part2.patch, 
> hbase-9390-part2-v2.patch, hbase-9390.patch, hbase-9390-v2.patch
>
>
> See the patch to reproduce the issue: If we activate log replay we don't have 
> the events on WAL restore.
> Pinging [~jeffreyz], we discussed this offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9390) coprocessors observers are not called during a recovery with the new log replay algorithm

2013-09-24 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13776750#comment-13776750
 ] 

Jeffrey Zhong commented on HBASE-9390:
--

[~saint@gmail.com] Just ping you to see if it's all right to check it(the 
part2-v2.patch) late this afternoon because you might cut another RC. This 
patch basically maintains preWalRestore semantics as existing in the new 
distributed log replay. There are also some code cleaning and a default config 
setting change which result in faster log replay recovery. Thanks.

> coprocessors observers are not called during a recovery with the new log 
> replay algorithm
> -
>
> Key: HBASE-9390
> URL: https://issues.apache.org/jira/browse/HBASE-9390
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, MTTR
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Jeffrey Zhong
> Attachments: copro.patch, hbase-9390-part2.patch, 
> hbase-9390-part2-v2.patch, hbase-9390.patch, hbase-9390-v2.patch
>
>
> See the patch to reproduce the issue: If we activate log replay we don't have 
> the events on WAL restore.
> Pinging [~jeffreyz], we discussed this offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9640) Increment of loadSequence in CoprocessorHost#loadInstance() is thread-unsafe

2013-09-25 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13778193#comment-13778193
 ] 

Jeffrey Zhong commented on HBASE-9640:
--

+1. Looks good to me.

> Increment of loadSequence in CoprocessorHost#loadInstance() is thread-unsafe 
> -
>
> Key: HBASE-9640
> URL: https://issues.apache.org/jira/browse/HBASE-9640
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 9640.txt
>
>
> {code}
> E env = createEnvironment(implClass, impl, priority, ++loadSequence, 
> conf);
> {code}
> Increment of loadSequence doesn't have proper synchronization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-9665) Region gets lost when balancer & SSH both trying to assign

2013-09-26 Thread Jeffrey Zhong (JIRA)
Jeffrey Zhong created HBASE-9665:


 Summary: Region gets lost when balancer & SSH both trying to 
assign 
 Key: HBASE-9665
 URL: https://issues.apache.org/jira/browse/HBASE-9665
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Affects Versions: 0.96.0
Reporter: Jeffrey Zhong
Priority: Critical


In summary, a server dies and its regions are re-assigned. While right before 
SSH, balancer is starting assign one region on the server to somewhere. 

The balancer assignment got preempted by the SSH assignment:
{code}
2013-09-25 11:55:32,854 INFO Priority.RpcServer.handler=7,port=60020 
regionserver.HRegionServer: Received CLOSE for the 
region:6deb1bfefe8cbdb443084efe919fdeb7 , which we are already trying to OPEN. 
Cancelling OPENING.
{code}

The SSH assignment(by GeneralBulkAssigner) failed too due to:
{code}
2013-09-25 11:55:32,927 WARN  [RS_OPEN_REGION-hor15n09:60020-2] 
zookeeper.ZKAssign: regionserver:60020-0x14153d449d30ad0 Attempt to transition 
the unassigned node for 6deb1bfefe8cbdb443084efe919fdeb7 from 
M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING failed, the server that tried to 
transition was hor15n09.gq1.ygridcore.net,60020,1380109280320 not the expected 
hor15n07.gq1.ygridcore.net,60020,1380109890414
{code}

In the end, the region 6deb1bfefe8cbdb443084efe919fdeb7 is lost.


Below is the master log, you can see both balancer and SSH try to assign the 
region around the same time:

{code}
2013-09-25 11:55:32,731 INFO  [MASTER_SERVER_OPERATIONS-hor15n05:6-4] 
master.RegionStates: Transitioning {6deb1bfefe8cbdb443084efe919fdeb7 
state=PENDING_CLOSE, ts=1380110132710, 
server=hor15n12.gq1.ygridcore.net,60020,1380109596307} will be handled by SSH 
for hor15n12.gq1.ygridcore.net,60020,1380109596307

...

2013-09-25 11:55:32,849 INFO  
[hor15n05.gq1.ygridcore.net,6,1380108611483-BalancerChore] 
master.RegionStates: Transitioned {6deb1bfefe8cbdb443084efe919fdeb7 
state=OFFLINE, ts=1380110132768, server=null} to 
{6deb1bfefe8cbdb443084efe919fdeb7 state=PENDING_OPEN, ts=1380110132849, 
server=hor15n07.gq1.ygridcore.net,60020,1380109890414}

...

2013-09-25 11:55:32,898 INFO  
[hor15n05.gq1.ygridcore.net,6,1380108611483-GeneralBulkAssigner-1] 
master.RegionStates: Transitioned {6deb1bfefe8cbdb443084efe919fdeb7 
state=OFFLINE, ts=1380110132861, server=null} to 
{6deb1bfefe8cbdb443084efe919fdeb7 state=PENDING_OPEN, ts=1380110132898, 
server=hor15n09.gq1.ygridcore.net,60020,1380109280320}
{code}

Since SSH force region assignment while it doesn't recreate offline znode, the 
later region opening would fail with the following error. I'm suggesting to 
recreate offline znode when we force a region assignment(forceNewPlan=true) 
with low impact.

{code}
2013-09-25 11:55:32,927 WARN  [RS_OPEN_REGION-hor15n09:60020-2] 
zookeeper.ZKAssign: regionserver:60020-0x14153d449d30ad0 Attempt to transition 
the unassigned node for 6deb1bfefe8cbdb443084efe919fdeb7 from 
M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING failed, the server that tried to 
transition was hor15n09.gq1.ygridcore.net,60020,1380109280320 not the expected 
hor15n07.gq1.ygridcore.net,60020,1380109890414
{code}



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9514) Prevent region from assigning before log splitting is done

2013-09-26 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13779146#comment-13779146
 ] 

Jeffrey Zhong commented on HBASE-9514:
--

[~jxiang] Will your patch cover the scenario HBASE-9665 where balancer starts 
to move a region while the server hosting the regions dies then the region got 
lost due to ZK RIT state is messed up with the two concurrent assignments?

> Prevent region from assigning before log splitting is done
> --
>
> Key: HBASE-9514
> URL: https://issues.apache.org/jira/browse/HBASE-9514
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Blocker
> Attachments: trunk-9514_v1.patch, trunk-9514_v2.patch, 
> trunk-9514_v3.patch
>
>
> If a region is assigned before log splitting is done by the server shutdown 
> handler, the edits belonging to this region in the hlogs of the dead server 
> will be lost.
> Generally this is not an issue if users don't assign/unassign a region from 
> hbase shell or via hbase admin. These commands are marked for experts only in 
> the hbase shell help too.  However, chaos monkey doesn't care.
> If we can prevent from assigning such regions in a bad time, it would make 
> things a little safer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9390) coprocessors observers are not called during a recovery with the new log replay algorithm

2013-09-27 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9390:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks everyone for the reviews! I've integrated the part2-v2 patch into 0.96 
and trunk. Thanks.

> coprocessors observers are not called during a recovery with the new log 
> replay algorithm
> -
>
> Key: HBASE-9390
> URL: https://issues.apache.org/jira/browse/HBASE-9390
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, MTTR
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Jeffrey Zhong
> Attachments: copro.patch, hbase-9390-part2.patch, 
> hbase-9390-part2-v2.patch, hbase-9390.patch, hbase-9390-v2.patch
>
>
> See the patch to reproduce the issue: If we activate log replay we don't have 
> the events on WAL restore.
> Pinging [~jeffreyz], we discussed this offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9672) LoadTestTool NPE's when -num_tables is given, but -tn is not

2013-09-27 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780176#comment-13780176
 ] 

Jeffrey Zhong commented on HBASE-9672:
--

+1. Looks good to me!

> LoadTestTool NPE's when -num_tables is given, but -tn is not
> 
>
> Key: HBASE-9672
> URL: https://issues.apache.org/jira/browse/HBASE-9672
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>Priority: Minor
> Fix For: 0.98.0, 0.96.0
>
> Attachments: hbase-9672_v1.patch
>
>
> {code}
> bin/hbase org.apache.hadoop.hbase.util.LoadTestTool -write 1:1:1 -num_tables 
> 10 -num_keys 1000
> {code}
> results in NPE. It expects -tn argument. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9664) ArrayIndexOutOfBoundsException may be thrown in TestZKSecretWatcher

2013-09-27 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780200#comment-13780200
 ] 

Jeffrey Zhong commented on HBASE-9664:
--

The patch looks good to me. +1

> ArrayIndexOutOfBoundsException may be thrown in TestZKSecretWatcher
> ---
>
> Key: HBASE-9664
> URL: https://issues.apache.org/jira/browse/HBASE-9664
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
> Attachments: 9664.txt
>
>
> In our internal Jenkins build, I saw failure in TestZKSecretWatcher:
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 2
>   at 
> org.apache.hadoop.hbase.security.token.TestZKSecretWatcher.setupBeforeClass(TestZKSecretWatcher.java:87)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> {code}
> This was due to i being 1, resulting in index of 2 being used in the 
> following statement:
> {code}
>   KEY_SLAVE = tmp[ i+1 % 2 ];
> {code}
> See http://docs.oracle.com/javase/tutorial/java/nutsandbolts/operators.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9390) coprocessors observers are not called during a recovery with the new log replay algorithm

2013-09-27 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780410#comment-13780410
 ] 

Jeffrey Zhong commented on HBASE-9390:
--

[~saint@gmail.com] Thanks for the good comments.

{quote}
The replay method got redone. What was objective?
{quote}
The reason is that we have to pass the original WALEdits in order to for us to 
skip replaying individual WALEdit when preWALRestore(can only work on original 
WALEdit) returns true. If we pass mutations(which are extracted from a 
WALEdit), we can't skip them because we can't reconstruct the original WALEdit.

{qutoe}
Is this doBatchOp that different from the current one? Could they be the same?
{quote}
This is related to the first one. Since we pass WALEdits from a client, we 
would convert current mutations to MutationProto and then inside the doBatchOp, 
those MutationProtos are converted back to mutations. In addition, we don't 
need MultiResponse returned, so the current doBatchOp will have an extra loop 
to construct MultiResponse which is eliminated from the override  doBatchOp. 

{quote}
Looking more, would getReplayMutations be better as a static over under wal 
package?
...
Structurally it is also odd calling the cp preWAL down inside in 
getReplayMutations but the postWAL is up in the calling method.
{quote}
Good question. The original thought is to save one extra loop to skip WALEdits 
by calling preWALRestore inside getReplayMutations when we construct the list 
of mutations. It's possible to create a static function convert one WALEntry at 
one time and the caller calls preWALRestore. I can submit an addendum to cover 
that. 


{quote}
We could size these arrays rather than have them expand since we know edit 
count on way in?
{quote}
Good point. Yes for mutations but it's hard for tmpEditMutations as we reuse it 
for all WALEntries. 


> coprocessors observers are not called during a recovery with the new log 
> replay algorithm
> -
>
> Key: HBASE-9390
> URL: https://issues.apache.org/jira/browse/HBASE-9390
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, MTTR
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Jeffrey Zhong
> Attachments: copro.patch, hbase-9390-part2.patch, 
> hbase-9390-part2-v2.patch, hbase-9390.patch, hbase-9390-v2.patch
>
>
> See the patch to reproduce the issue: If we activate log replay we don't have 
> the events on WAL restore.
> Pinging [~jeffreyz], we discussed this offline.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HBASE-9390) coprocessors observers are not called during a recovery with the new log replay algorithm

2013-09-28 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong reopened HBASE-9390:
--


> coprocessors observers are not called during a recovery with the new log 
> replay algorithm
> -
>
> Key: HBASE-9390
> URL: https://issues.apache.org/jira/browse/HBASE-9390
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, MTTR
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Jeffrey Zhong
> Attachments: copro.patch, hbase-9390-part2.patch, 
> hbase-9390-part2-v2.patch, hbase-9390.patch, hbase-9390-v2.patch
>
>
> See the patch to reproduce the issue: If we activate log replay we don't have 
> the events on WAL restore.
> Pinging [~jeffreyz], we discussed this offline.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9390) coprocessors observers are not called during a recovery with the new log replay algorithm

2013-09-28 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9390:
-

Status: Patch Available  (was: Reopened)

> coprocessors observers are not called during a recovery with the new log 
> replay algorithm
> -
>
> Key: HBASE-9390
> URL: https://issues.apache.org/jira/browse/HBASE-9390
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, MTTR
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Jeffrey Zhong
> Attachments: copro.patch, hbase-9390-part2.patch, 
> hbase-9390-part2-v2.patch, hbase-9390.patch, hbase-9390-v2.patch
>
>
> See the patch to reproduce the issue: If we activate log replay we don't have 
> the events on WAL restore.
> Pinging [~jeffreyz], we discussed this offline.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9390) coprocessors observers are not called during a recovery with the new log replay algorithm

2013-09-28 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9390:
-

Attachment: hbase-9390-review-addendum.patch

[~saint@gmail.com] The addendum patch is to address your review comments. 
1) It basically moves getReplayMutations from HRegion.java to HLogSplitter 
2) Move preWALRestore call to the same level as the postWALRestore
3) Add a comment in the overloaded doBatchOp for clarifications.

> coprocessors observers are not called during a recovery with the new log 
> replay algorithm
> -
>
> Key: HBASE-9390
> URL: https://issues.apache.org/jira/browse/HBASE-9390
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, MTTR
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Jeffrey Zhong
> Attachments: copro.patch, hbase-9390-part2.patch, 
> hbase-9390-part2-v2.patch, hbase-9390.patch, 
> hbase-9390-review-addendum.patch, hbase-9390-v2.patch
>
>
> See the patch to reproduce the issue: If we activate log replay we don't have 
> the events on WAL restore.
> Pinging [~jeffreyz], we discussed this offline.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9390) coprocessors observers are not called during a recovery with the new log replay algorithm

2013-09-29 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781615#comment-13781615
 ] 

Jeffrey Zhong commented on HBASE-9390:
--

Thanks [~saint@gmail.com] for the review. I've integrated the review 
addendum patch into 0.96 and trunk.

> coprocessors observers are not called during a recovery with the new log 
> replay algorithm
> -
>
> Key: HBASE-9390
> URL: https://issues.apache.org/jira/browse/HBASE-9390
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, MTTR
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Jeffrey Zhong
> Attachments: copro.patch, hbase-9390-part2.patch, 
> hbase-9390-part2-v2.patch, hbase-9390.patch, 
> hbase-9390-review-addendum.patch, hbase-9390-v2.patch
>
>
> See the patch to reproduce the issue: If we activate log replay we don't have 
> the events on WAL restore.
> Pinging [~jeffreyz], we discussed this offline.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9688) Fix javadoc warning in HConnectionManager class javadoc

2013-09-30 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782531#comment-13782531
 ] 

Jeffrey Zhong commented on HBASE-9688:
--

Looks good to me.

> Fix javadoc warning in HConnectionManager class javadoc
> ---
>
> Key: HBASE-9688
> URL: https://issues.apache.org/jira/browse/HBASE-9688
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Attachments: 9688.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-HBASE-Build/7422/artifact/trunk/patchprocess/patchJavadocWarnings.txt
>  :
> {code}
> [WARNING] Javadoc Warnings
> [WARNING] 
> /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java:193:
>  warning - End Delimiter } missing for possible See Tag in comment string: "A 
> non-instantiable class that manages creation of {@link HConnection}s.
> [WARNING] The simplest way to use this class is by using {@link 
> #createConnection(Configuration)}.
> [WARNING] This creates a new {@link HConnection} to the cluster that is 
> managed by the caller.
> [WARNING] From this {@link HConnection} {@link HTableInterface} 
> implementations are retrieved
> [WARNING] with {@link HConnection#getTable(byte[])}. Example:
> [WARNING] 
> [WARNING] {@code
> [WARNING] HConnection connection = 
> HConnectionManager.createConnection(config);
> [WARNING] HTableInterface table = connection.getTable("table1");
> [WARNING] try {
> [WARNING] // Use the table as needed, for a single operation and a single 
> thread
> [WARNING] } finally {
> [WARNING] table.close();
> [WARNING] connection.close();
> [WARNING] }
> [WARNING] 
> [WARNING] The following logic and API will be removed in the future:
> [WARNING] This class has a static Map of {@link HConnection} instances 
> keyed by
> {code}
> The @code misses right brace



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9709) LogReplay throws NPE when no KVs to be replayed in a WALEdit

2013-10-03 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9709:
-

Fix Version/s: (was: 0.96.1)

> LogReplay throws NPE when no KVs to be replayed in a WALEdit
> 
>
> Key: HBASE-9709
> URL: https://issues.apache.org/jira/browse/HBASE-9709
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
>
> This is regression from my recent checkin from hbase-9390 below is the 
> exception stack:
> {code}
> 2013-10-03 09:34:32,735 ERROR [WriterThread-1] wal.HLogSplitter: Exiting 
> thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogReplayOutputSink.groupEditsByServer(HLogSplitter.java:1489)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogReplayOutputSink.append(HLogSplitter.java:1368)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.writeBuffer(HLogSplitter.java:847)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.doRun(HLogSplitter.java:839)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.run(HLogSplitter.java:809)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9709) LogReplay throws NPE when no KVs to be replayed in a WALEdit

2013-10-03 Thread Jeffrey Zhong (JIRA)
Jeffrey Zhong created HBASE-9709:


 Summary: LogReplay throws NPE when no KVs to be replayed in a 
WALEdit
 Key: HBASE-9709
 URL: https://issues.apache.org/jira/browse/HBASE-9709
 Project: HBase
  Issue Type: Bug
Reporter: Jeffrey Zhong
Priority: Minor


This is regression from my recent checkin from hbase-9390 below is the 
exception stack:

{code}
2013-10-03 09:34:32,735 ERROR [WriterThread-1] wal.HLogSplitter: Exiting thread
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogReplayOutputSink.groupEditsByServer(HLogSplitter.java:1489)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogReplayOutputSink.append(HLogSplitter.java:1368)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.writeBuffer(HLogSplitter.java:847)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.doRun(HLogSplitter.java:839)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.run(HLogSplitter.java:809)
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9709) LogReplay throws NPE when no KVs to be replayed in a WALEdit

2013-10-03 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9709:
-

Affects Version/s: 0.96.0
Fix Version/s: 0.96.1
 Assignee: Jeffrey Zhong

> LogReplay throws NPE when no KVs to be replayed in a WALEdit
> 
>
> Key: HBASE-9709
> URL: https://issues.apache.org/jira/browse/HBASE-9709
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
>
> This is regression from my recent checkin from hbase-9390 below is the 
> exception stack:
> {code}
> 2013-10-03 09:34:32,735 ERROR [WriterThread-1] wal.HLogSplitter: Exiting 
> thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogReplayOutputSink.groupEditsByServer(HLogSplitter.java:1489)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogReplayOutputSink.append(HLogSplitter.java:1368)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.writeBuffer(HLogSplitter.java:847)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.doRun(HLogSplitter.java:839)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.run(HLogSplitter.java:809)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9709) LogReplay throws NPE when no KVs to be replayed in a WALEdit

2013-10-03 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9709:
-

Attachment: hbase-9709.patch

> LogReplay throws NPE when no KVs to be replayed in a WALEdit
> 
>
> Key: HBASE-9709
> URL: https://issues.apache.org/jira/browse/HBASE-9709
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
> Attachments: hbase-9709.patch
>
>
> This is regression from my recent checkin from hbase-9390 below is the 
> exception stack:
> {code}
> 2013-10-03 09:34:32,735 ERROR [WriterThread-1] wal.HLogSplitter: Exiting 
> thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogReplayOutputSink.groupEditsByServer(HLogSplitter.java:1489)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogReplayOutputSink.append(HLogSplitter.java:1368)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.writeBuffer(HLogSplitter.java:847)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.doRun(HLogSplitter.java:839)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.run(HLogSplitter.java:809)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9709) LogReplay throws NPE when no KVs to be replayed in a WALEdit

2013-10-03 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9709:
-

Status: Patch Available  (was: Open)

> LogReplay throws NPE when no KVs to be replayed in a WALEdit
> 
>
> Key: HBASE-9709
> URL: https://issues.apache.org/jira/browse/HBASE-9709
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
> Attachments: hbase-9709.patch
>
>
> This is regression from my recent checkin from hbase-9390 below is the 
> exception stack:
> {code}
> 2013-10-03 09:34:32,735 ERROR [WriterThread-1] wal.HLogSplitter: Exiting 
> thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogReplayOutputSink.groupEditsByServer(HLogSplitter.java:1489)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogReplayOutputSink.append(HLogSplitter.java:1368)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.writeBuffer(HLogSplitter.java:847)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.doRun(HLogSplitter.java:839)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.run(HLogSplitter.java:809)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9709) LogReplay throws NPE when no KVs to be replayed in a WALEdit

2013-10-04 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9709:
-

   Resolution: Fixed
Fix Version/s: 0.96.0
   0.98.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks [~te...@apache.org] for the review! I've integrated the small change 
into 0.96 and trunk. Thanks.

> LogReplay throws NPE when no KVs to be replayed in a WALEdit
> 
>
> Key: HBASE-9709
> URL: https://issues.apache.org/jira/browse/HBASE-9709
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
> Fix For: 0.98.0, 0.96.0
>
> Attachments: hbase-9709.patch
>
>
> This is regression from my recent checkin from hbase-9390 below is the 
> exception stack:
> {code}
> 2013-10-03 09:34:32,735 ERROR [WriterThread-1] wal.HLogSplitter: Exiting 
> thread
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogReplayOutputSink.groupEditsByServer(HLogSplitter.java:1489)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogReplayOutputSink.append(HLogSplitter.java:1368)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.writeBuffer(HLogSplitter.java:847)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.doRun(HLogSplitter.java:839)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.run(HLogSplitter.java:809)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9723) TestAsyncProcess#testFailAndSuccess & testThreadCreation are flaky on SUSE

2013-10-07 Thread Jeffrey Zhong (JIRA)
Jeffrey Zhong created HBASE-9723:


 Summary: TestAsyncProcess#testFailAndSuccess & testThreadCreation 
are flaky on SUSE
 Key: HBASE-9723
 URL: https://issues.apache.org/jira/browse/HBASE-9723
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.96.0
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
Priority: Minor
 Fix For: 0.98.0, 0.96.1


When TestAsyncProcess runs on SUSE, the testFailAndSuccess & testThreadCreation 
fails intermittently with the following stack:

Error Trace for testFailAndSuccess
{code}
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.hbase.client.TestAsyncProcess.testFailAndSuccess(TestAsyncProcess.java:394)
{code}

Error trace for testThreadCreation
{code}
java.lang.AssertionError: expected:<1> but was:<2>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.hbase.client.TestAsyncProcess.testThreadCreation(TestAsyncProcess.java:728)
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9723) TestAsyncProcess#testFailAndSuccess & testThreadCreation are flaky on SUSE

2013-10-07 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9723:
-

Attachment: hbase-9723.patch

I think the reason for testFailAndSuccess failure is that the test doesn't 
really complete like the retry didn't release the task count(because the test 
only wait for hasError becomes true but there still are clean up work) yet so 
the following put failed to be submitted.

For testThreadCreation, it's due to the reason that work item for one server 
completes too quickly so the second thread isn't created instead the first 
thread is reused for the work item of another server. I added one second sleep 
to simulate work load so that threadpool can trigger one more thread.

> TestAsyncProcess#testFailAndSuccess & testThreadCreation are flaky on SUSE
> --
>
> Key: HBASE-9723
> URL: https://issues.apache.org/jira/browse/HBASE-9723
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
> Fix For: 0.98.0, 0.96.1
>
> Attachments: hbase-9723.patch
>
>
> When TestAsyncProcess runs on SUSE, the testFailAndSuccess & 
> testThreadCreation fails intermittently with the following stack:
> Error Trace for testFailAndSuccess
> {code}
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hbase.client.TestAsyncProcess.testFailAndSuccess(TestAsyncProcess.java:394)
> {code}
> Error trace for testThreadCreation
> {code}
> java.lang.AssertionError: expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hbase.client.TestAsyncProcess.testThreadCreation(TestAsyncProcess.java:728)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9723) TestAsyncProcess#testFailAndSuccess & testThreadCreation are flaky on SUSE

2013-10-07 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9723:
-

Status: Patch Available  (was: Open)

> TestAsyncProcess#testFailAndSuccess & testThreadCreation are flaky on SUSE
> --
>
> Key: HBASE-9723
> URL: https://issues.apache.org/jira/browse/HBASE-9723
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
> Fix For: 0.98.0, 0.96.1
>
> Attachments: hbase-9723.patch
>
>
> When TestAsyncProcess runs on SUSE, the testFailAndSuccess & 
> testThreadCreation fails intermittently with the following stack:
> Error Trace for testFailAndSuccess
> {code}
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hbase.client.TestAsyncProcess.testFailAndSuccess(TestAsyncProcess.java:394)
> {code}
> Error trace for testThreadCreation
> {code}
> java.lang.AssertionError: expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hbase.client.TestAsyncProcess.testThreadCreation(TestAsyncProcess.java:728)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9730) Exceptions in multi operations are not handled correctly

2013-10-08 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789928#comment-13789928
 ] 

Jeffrey Zhong commented on HBASE-9730:
--

Good Catch! [~enis] I think your patch fixes the root cause. +1

> Exceptions in multi operations are not handled correctly
> 
>
> Key: HBASE-9730
> URL: https://issues.apache.org/jira/browse/HBASE-9730
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>Priority: Blocker
> Fix For: 0.98.0, 0.96.0
>
> Attachments: hbase-9730_v1-0.96.patch, hbase-9730_v1.patch
>
>
> The symptoms are that, both ITBLL and ITLAV fail in their verification steps 
> complaining about lots of undefined rows. 
> {code}
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$Counts
> REFERENCED=199619372
> UNDEFINED=190084
> UNREFERENCED=190084
> {code}
> I think the problem is in HRegionServer.doBatchOp() where in case 
> HRegion.batchMutate() throws an exception, RegionActionResult indexes are not 
> set correctly.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9723) TestAsyncProcess#testFailAndSuccess & testThreadCreation are flaky on SUSE

2013-10-09 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9723:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> TestAsyncProcess#testFailAndSuccess & testThreadCreation are flaky on SUSE
> --
>
> Key: HBASE-9723
> URL: https://issues.apache.org/jira/browse/HBASE-9723
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
> Fix For: 0.98.0, 0.96.1
>
> Attachments: hbase-9723.patch
>
>
> When TestAsyncProcess runs on SUSE, the testFailAndSuccess & 
> testThreadCreation fails intermittently with the following stack:
> Error Trace for testFailAndSuccess
> {code}
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hbase.client.TestAsyncProcess.testFailAndSuccess(TestAsyncProcess.java:394)
> {code}
> Error trace for testThreadCreation
> {code}
> java.lang.AssertionError: expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hbase.client.TestAsyncProcess.testThreadCreation(TestAsyncProcess.java:728)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9723) TestAsyncProcess#testFailAndSuccess & testThreadCreation are flaky on SUSE

2013-10-09 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13790660#comment-13790660
 ] 

Jeffrey Zhong commented on HBASE-9723:
--

Thanks [~nkeywal] for the review! I have integrated the fix into trunk and 0.96 
branch as the same test failed intermittently in Apache build as well recently.

> TestAsyncProcess#testFailAndSuccess & testThreadCreation are flaky on SUSE
> --
>
> Key: HBASE-9723
> URL: https://issues.apache.org/jira/browse/HBASE-9723
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
> Fix For: 0.98.0, 0.96.1
>
> Attachments: hbase-9723.patch
>
>
> When TestAsyncProcess runs on SUSE, the testFailAndSuccess & 
> testThreadCreation fails intermittently with the following stack:
> Error Trace for testFailAndSuccess
> {code}
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hbase.client.TestAsyncProcess.testFailAndSuccess(TestAsyncProcess.java:394)
> {code}
> Error trace for testThreadCreation
> {code}
> java.lang.AssertionError: expected:<1> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.hbase.client.TestAsyncProcess.testThreadCreation(TestAsyncProcess.java:728)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9696) Master recovery ignores online merge znode

2013-10-09 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791014#comment-13791014
 ] 

Jeffrey Zhong commented on HBASE-9696:
--

[~jxiang] Do you have a more concret statck trace? From the code, when master 
restarts, it reads the merging RIT znodes to reconstruct the RIT state in the 
following code:
{code}
  case RS_ZK_REGION_MERGING:
...
  handleRegionMerging(rt, prettyPrintedRegionName, sn);
...
break;
{code}

And in unsign(the move region request) we handle the merging issue as following:
{code}
  if (isSplitOrSplittingOrMergedOrMerging(path)) {
LOG.debug(path + " is SPLIT or SPLITTING or MERGED or MERGING; 
" +
  "skipping unassign because region no longer exists -- its 
split or merge");
reassign = false; // no need to reassign for split/merged region
return;
  }
{code}

It seems to me that the issue in the JIRA should be a very rare case(not the 
normal code path), right? Thanks.

> Master recovery ignores online merge znode
> --
>
> Key: HBASE-9696
> URL: https://issues.apache.org/jira/browse/HBASE-9696
> Project: HBase
>  Issue Type: Bug
>  Components: master, Region Assignment
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 0.98.0, 0.96.1
>
> Attachments: trunk-9696.patch
>
>
> The online merge znode uses the new region to be created.  When master 
> restarts, the new region is still unknown if the merging is not completed. 
> Therefore the znode is ignored, which should not.  That means the two merging 
> regions could be moved around.  This could cause some data loss if we are not 
> luck.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9696) Master recovery ignores online merge znode

2013-10-09 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791216#comment-13791216
 ] 

Jeffrey Zhong commented on HBASE-9696:
--

{quote}
If hri is null, it returns false
{quote}
Thanks for pointing this out. I saw inside your patch (the following several 
lines) should address the JIRA issue. The rest is more like enhancement. Is 
that possible to split the patch into two: one as bug fix(to mitigate impacts) 
and the rest as enhancement? 
{code}
+EventType et = rt.getEventType();
+if (hri == null && et != EventType.RS_ZK_REGION_MERGING

-final String encodedRegionName = regionInfo.getEncodedName();
-final String prettyPrintedRegionName = 
HRegionInfo.prettyPrint(encodedRegionName);
-LOG.info("Processing " + regionInfo.getRegionNameAsString() + " in state " 
+ et);
-
+final byte[] regionName = rt.getRegionName();
+final String encodedName = HRegionInfo.encodeRegionName(regionName);
+final String prettyPrintedRegionName = 
HRegionInfo.prettyPrint(encodedName);
+LOG.info("Processing " + prettyPrintedRegionName + " in state " + et);
 
-if (regionStates.isRegionInTransition(encodedRegionName)) {
+if (regionStates.isRegionInTransition(encodedName)) {
{code}

> Master recovery ignores online merge znode
> --
>
> Key: HBASE-9696
> URL: https://issues.apache.org/jira/browse/HBASE-9696
> Project: HBase
>  Issue Type: Bug
>  Components: master, Region Assignment
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 0.98.0, 0.96.1
>
> Attachments: trunk-9696.patch, trunk-9696_v2.patch
>
>
> The online merge znode uses the new region to be created.  When master 
> restarts, the new region is still unknown if the merging is not completed. 
> Therefore the znode is ignored, which should not.  That means the two merging 
> regions could be moved around.  This could cause some data loss if we are not 
> luck.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9696) Master recovery ignores online merge znode

2013-10-10 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792353#comment-13792353
 ] 

Jeffrey Zhong commented on HBASE-9696:
--

[~jxiang] What's the reason introducing the four new states? For merging, it's 
a master initiated operation we should have corresponding memory state so 
future region move requests can be cancelled. For splitting, before a region 
split starts, it creates a splitting RIT znode while a region move will try to 
create a closing RIT znode so only one operation can succeed. In master 
recovery scenario, merging state can be restored by a few lines of code in the 
patch. 

The reason I'm asking is that not only the 4 new stats increase the risk of the 
patch(because you want to put it in 0.96) but also a more complicated region 
transition state machine for future maintenance.  Any chance can you cut the 
new states? Thanks.


> Master recovery ignores online merge znode
> --
>
> Key: HBASE-9696
> URL: https://issues.apache.org/jira/browse/HBASE-9696
> Project: HBase
>  Issue Type: Bug
>  Components: master, Region Assignment
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 0.98.0, 0.96.1
>
> Attachments: trunk-9696.patch, trunk-9696_v2.1.patch, 
> trunk-9696_v2.patch, trunk-9696_v3.patch
>
>
> The online merge znode uses the new region to be created.  When master 
> restarts, the new region is still unknown if the merging is not completed. 
> Therefore the znode is ignored, which should not.  That means the two merging 
> regions could be moved around.  This could cause some data loss if we are not 
> luck.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9696) Master recovery ignores online merge znode

2013-10-11 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792954#comment-13792954
 ] 

Jeffrey Zhong commented on HBASE-9696:
--

Thanks [~saint@gmail.com] and [~jxiang] for clarifications. We can discuss 
more on new AM design with a systematic model/approach to handle 
create/delete/move/merge/split handling in meet up. I'll leave comments on RB. 
Thanks.

> Master recovery ignores online merge znode
> --
>
> Key: HBASE-9696
> URL: https://issues.apache.org/jira/browse/HBASE-9696
> Project: HBase
>  Issue Type: Bug
>  Components: master, Region Assignment
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 0.98.0, 0.96.1
>
> Attachments: 0.96-9696_v3.2.patch, trunk-9696.patch, 
> trunk-9696_v2.1.patch, trunk-9696_v2.patch, trunk-9696_v3.1.patch, 
> trunk-9696_v3.2.patch, trunk-9696_v3.patch, trunk-9696_v3.patch
>
>
> The online merge znode uses the new region to be created.  When master 
> restarts, the new region is still unknown if the merging is not completed. 
> Therefore the znode is ignored, which should not.  That means the two merging 
> regions could be moved around.  This could cause some data loss if we are not 
> luck.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9696) Master recovery ignores online merge znode

2013-10-11 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793050#comment-13793050
 ] 

Jeffrey Zhong commented on HBASE-9696:
--

I read through the patch which seems ok to me(+1).  Since it's hard to cover 
all the variations from the new states, I agree you that let's fix new issue if 
there're some overlooked factors. Thanks.

> Master recovery ignores online merge znode
> --
>
> Key: HBASE-9696
> URL: https://issues.apache.org/jira/browse/HBASE-9696
> Project: HBase
>  Issue Type: Bug
>  Components: master, Region Assignment
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 0.98.0, 0.96.1
>
> Attachments: 0.96-9696_v3.2.patch, 0.96-9696_v3.3.patch, 
> trunk-9696.patch, trunk-9696_v2.1.patch, trunk-9696_v2.patch, 
> trunk-9696_v3.1.patch, trunk-9696_v3.2.patch, trunk-9696_v3.3.patch, 
> trunk-9696_v3.patch, trunk-9696_v3.patch
>
>
> The online merge znode uses the new region to be created.  When master 
> restarts, the new region is still unknown if the merging is not completed. 
> Therefore the znode is ignored, which should not.  That means the two merging 
> regions could be moved around.  This could cause some data loss if we are not 
> luck.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9768) Two issues in AsyncProcess

2013-10-15 Thread Jeffrey Zhong (JIRA)
Jeffrey Zhong created HBASE-9768:


 Summary: Two issues in AsyncProcess
 Key: HBASE-9768
 URL: https://issues.apache.org/jira/browse/HBASE-9768
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Jeffrey Zhong
Assignee: Nicolas Liochon
 Fix For: 0.98.0, 0.96.1


There may exist two issues in Asyncprocess code as following:

1)  In Htable#backgroundFlushCommits, we have following code:

{code}
  if (ap.hasError()) {
if (!clearBufferOnFail) {
  // if clearBufferOnFailed is not set, we're supposed to keep the 
failed operation in the
  //  write buffer. This is a questionable feature kept here for 
backward compatibility
  writeAsyncBuffer.addAll(ap.getFailedOperations());
}
RetriesExhaustedWithDetailsException e = ap.getErrors();
ap.clearErrors();
throw e;
  }
{code}

In a rare situation like the following: 
When there are some updates ongoing, a client call Put(internally 
backgroundFlushCommits get triggered). Then comes the issue:
The first ap.hasError() returns false and the second ap.hasError() returns 
true. So we could throw exception to caller while writeAsyncBuffer isn't 
empty.(some updates are still on going).
If a client retry with different values for the same keys, we could end up with 
nondeterministic state.

2) The following code only update cache for the first row. We should update 
cache for all the regions inside resultForRS because actions are sent to 
multiple regions per RS

{code}
  if (failureCount++ == 0) { // We're doing this once per location.
hConnection.updateCachedLocations(this.tableName, row.getRow(), 
result, location);
if (errorsByServer != null) {
  errorsByServer.reportServerError(location);
  canRetry = errorsByServer.canRetryMore();
}
  }
{code}




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9776) Test Load And Verify Fails with TableNotEnabledException

2013-10-15 Thread Jeffrey Zhong (JIRA)
Jeffrey Zhong created HBASE-9776:


 Summary: Test Load And Verify Fails with TableNotEnabledException
 Key: HBASE-9776
 URL: https://issues.apache.org/jira/browse/HBASE-9776
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.96.0
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
Priority: Minor


Occasionally IntegrationTestLoadAndVerify failed with the following error. This 
is caused by RPC retry and the first attempt actually went through 
successfully.  

{code}
2013-10-10 
19:55:54,339|beaver.machine|INFO|org.apache.hadoop.hbase.TableNotEnabledException:
 org.apache.hadoop.hbase.TableNotEnabledException: IntegrationTestLoadAndVerify
2013-10-10 19:55:54,340|beaver.machine|INFO|at 
org.apache.hadoop.hbase.master.handler.DisableTableHandler.prepare(DisableTableHandler.java:100)
2013-10-10 19:55:54,341|beaver.machine|INFO|at 
org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1979)
2013-10-10 19:55:54,342|beaver.machine|INFO|at 
org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1990)
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9776) Test Load And Verify Fails with TableNotEnabledException

2013-10-15 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9776:
-

Description: 
Occasionally IntegrationTestLoadAndVerify failed with the following error. This 
is caused by RPC retry and the first attempt actually went through successfully 
and the second retry attempt fails because the table is disabled by the first 
attempt.

{code}
2013-10-10 
19:55:54,339|beaver.machine|INFO|org.apache.hadoop.hbase.TableNotEnabledException:
 org.apache.hadoop.hbase.TableNotEnabledException: IntegrationTestLoadAndVerify
2013-10-10 19:55:54,340|beaver.machine|INFO|at 
org.apache.hadoop.hbase.master.handler.DisableTableHandler.prepare(DisableTableHandler.java:100)
2013-10-10 19:55:54,341|beaver.machine|INFO|at 
org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1979)
2013-10-10 19:55:54,342|beaver.machine|INFO|at 
org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1990)
{code}

  was:
Occasionally IntegrationTestLoadAndVerify failed with the following error. This 
is caused by RPC retry and the first attempt actually went through 
successfully.  

{code}
2013-10-10 
19:55:54,339|beaver.machine|INFO|org.apache.hadoop.hbase.TableNotEnabledException:
 org.apache.hadoop.hbase.TableNotEnabledException: IntegrationTestLoadAndVerify
2013-10-10 19:55:54,340|beaver.machine|INFO|at 
org.apache.hadoop.hbase.master.handler.DisableTableHandler.prepare(DisableTableHandler.java:100)
2013-10-10 19:55:54,341|beaver.machine|INFO|at 
org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1979)
2013-10-10 19:55:54,342|beaver.machine|INFO|at 
org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1990)
{code}


> Test Load And Verify Fails with TableNotEnabledException
> 
>
> Key: HBASE-9776
> URL: https://issues.apache.org/jira/browse/HBASE-9776
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
>
> Occasionally IntegrationTestLoadAndVerify failed with the following error. 
> This is caused by RPC retry and the first attempt actually went through 
> successfully and the second retry attempt fails because the table is disabled 
> by the first attempt.
> {code}
> 2013-10-10 
> 19:55:54,339|beaver.machine|INFO|org.apache.hadoop.hbase.TableNotEnabledException:
>  org.apache.hadoop.hbase.TableNotEnabledException: 
> IntegrationTestLoadAndVerify
> 2013-10-10 19:55:54,340|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.handler.DisableTableHandler.prepare(DisableTableHandler.java:100)
> 2013-10-10 19:55:54,341|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1979)
> 2013-10-10 19:55:54,342|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1990)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9776) Test Load And Verify Fails with TableNotEnabledException

2013-10-15 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796208#comment-13796208
 ] 

Jeffrey Zhong commented on HBASE-9776:
--

Yes, the disableTable is not idempotent operations because the subsequent retry 
fails while we can't remove the enable state check because we use it to sync 
operations like: one is trying to do schema changes and the other is trying to 
delete the same table. 

My plan is to use HBaseTestingUtility#deleteTable instead to let the 
application client to eat the exception if happens and proceed with delete 
because we're in clean up phase.

> Test Load And Verify Fails with TableNotEnabledException
> 
>
> Key: HBASE-9776
> URL: https://issues.apache.org/jira/browse/HBASE-9776
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
>
> Occasionally IntegrationTestLoadAndVerify failed with the following error. 
> This is caused by RPC retry and the first attempt actually went through 
> successfully and the second retry attempt fails because the table is disabled 
> by the first attempt.
> {code}
> 2013-10-10 
> 19:55:54,339|beaver.machine|INFO|org.apache.hadoop.hbase.TableNotEnabledException:
>  org.apache.hadoop.hbase.TableNotEnabledException: 
> IntegrationTestLoadAndVerify
> 2013-10-10 19:55:54,340|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.handler.DisableTableHandler.prepare(DisableTableHandler.java:100)
> 2013-10-10 19:55:54,341|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1979)
> 2013-10-10 19:55:54,342|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1990)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9776) Test Load And Verify Fails with TableNotEnabledException

2013-10-15 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9776:
-

Status: Patch Available  (was: Open)

> Test Load And Verify Fails with TableNotEnabledException
> 
>
> Key: HBASE-9776
> URL: https://issues.apache.org/jira/browse/HBASE-9776
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
> Attachments: hbase-9776.patch
>
>
> Occasionally IntegrationTestLoadAndVerify failed with the following error. 
> This is caused by RPC retry and the first attempt actually went through 
> successfully and the second retry attempt fails because the table is disabled 
> by the first attempt.
> {code}
> 2013-10-10 
> 19:55:54,339|beaver.machine|INFO|org.apache.hadoop.hbase.TableNotEnabledException:
>  org.apache.hadoop.hbase.TableNotEnabledException: 
> IntegrationTestLoadAndVerify
> 2013-10-10 19:55:54,340|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.handler.DisableTableHandler.prepare(DisableTableHandler.java:100)
> 2013-10-10 19:55:54,341|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1979)
> 2013-10-10 19:55:54,342|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1990)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9776) Test Load And Verify Fails with TableNotEnabledException

2013-10-15 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9776:
-

Attachment: hbase-9776.patch

The fix is simple and use HBaseTestingUtility#deleteTable to delete a table in 
clean up phase. The utility deleteTable ignores the TableNotEnabledException 
and proceed with deletion.

> Test Load And Verify Fails with TableNotEnabledException
> 
>
> Key: HBASE-9776
> URL: https://issues.apache.org/jira/browse/HBASE-9776
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
> Attachments: hbase-9776.patch
>
>
> Occasionally IntegrationTestLoadAndVerify failed with the following error. 
> This is caused by RPC retry and the first attempt actually went through 
> successfully and the second retry attempt fails because the table is disabled 
> by the first attempt.
> {code}
> 2013-10-10 
> 19:55:54,339|beaver.machine|INFO|org.apache.hadoop.hbase.TableNotEnabledException:
>  org.apache.hadoop.hbase.TableNotEnabledException: 
> IntegrationTestLoadAndVerify
> 2013-10-10 19:55:54,340|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.handler.DisableTableHandler.prepare(DisableTableHandler.java:100)
> 2013-10-10 19:55:54,341|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1979)
> 2013-10-10 19:55:54,342|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1990)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9776) Test Load And Verify Fails with TableNotEnabledException

2013-10-15 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796305#comment-13796305
 ] 

Jeffrey Zhong commented on HBASE-9776:
--

[~saint@gmail.com] I'm afraid you hit a different issue. The above stack 
trace you posted seems we have a half done deletion before and subsequent 
retries all failed because of that.  Since delete/disable/create table 
operations aren't idempotent, executeCallable on these table operations is 
problematic. I guess we need a FATE like model for  table operations.


> Test Load And Verify Fails with TableNotEnabledException
> 
>
> Key: HBASE-9776
> URL: https://issues.apache.org/jira/browse/HBASE-9776
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
> Attachments: hbase-9776.patch
>
>
> Occasionally IntegrationTestLoadAndVerify failed with the following error. 
> This is caused by RPC retry and the first attempt actually went through 
> successfully and the second retry attempt fails because the table is disabled 
> by the first attempt.
> {code}
> 2013-10-10 
> 19:55:54,339|beaver.machine|INFO|org.apache.hadoop.hbase.TableNotEnabledException:
>  org.apache.hadoop.hbase.TableNotEnabledException: 
> IntegrationTestLoadAndVerify
> 2013-10-10 19:55:54,340|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.handler.DisableTableHandler.prepare(DisableTableHandler.java:100)
> 2013-10-10 19:55:54,341|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1979)
> 2013-10-10 19:55:54,342|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1990)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9773) Master aborted when hbck asked the master to assign a region that was already online

2013-10-16 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797344#comment-13797344
 ] 

Jeffrey Zhong commented on HBASE-9773:
--

I checked the fix and I think it opens the door for double assignment. 
Basically closeRegion request is processed asynchronously. Even we send close 
RPC to a region's host region server, the region could open on another region 
server before the old region server really close the region. Then we end up in 
double assignment issue.

In addition, we potentially have a data loss situation. 
AM#forceRegionStateToOffline doesn't wait for region is fully closed. If a 
region is open while the old RS still flush, then some store files may not open 
in the new location. Even more, if the old RS crashes, the WAL splitting will 
be skipped then we have a permanent data loss.

[~jxiang] Could you please double check the above? Meanwhile let me try to come 
up an addendum patch. Thanks.  

> Master aborted when hbck asked the master to assign a region that was already 
> online
> 
>
> Key: HBASE-9773
> URL: https://issues.apache.org/jira/browse/HBASE-9773
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Jimmy Xiang
> Fix For: 0.98.0, 0.96.1
>
> Attachments: trunk-9773.patch, trunk-9773_v2.patch
>
>
> Came across this situation (with a version of 0.96 very close to RC5 version 
> created on 10/11):
> The sequence of events that happened:
> 1. The hbck tool couldn't communicate with the RegionServer hosting namespace 
> region due to some security exceptions. hbck INCORRECTLY assumed the region 
> was not deployed.
> In output.log (client side):
> {noformat}
> 2013-10-12 10:42:57,067|beaver.machine|INFO|ERROR: Region { meta => 
> hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a., hdfs => 
> hdfs://gs-hdp2-secure-1381559462-hbase-12.cs1cloud.internal:8020/apps/hbase/data/data/hbase/namespace/a0ac0825ba2d0830614e7f808f31787a,
>  deployed =>  } not deployed on any region server.
> 2013-10-12 10:42:57,067|beaver.machine|INFO|Trying to fix unassigned region...
> {noformat}
> 2. This led to the hbck tool trying to tell the master to "assign" the region.
> In master log (hbase-hbase-master-gs-hdp2-secure-1381559462-hbase-12.log):
> {noformat}
> 2013-10-12 10:52:35,960 INFO  [RpcServer.handler=4,port=6] 
> master.HMaster: Client=hbase//172.18.145.105 assign 
> hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.
> {noformat}
> 3. The master went through the steps - sent a CLOSE to the RegionServer 
> hosting namespace region.
> From master log:
> {noformat}
> 2013-10-12 10:52:35,981 DEBUG [RpcServer.handler=4,port=6] 
> master.AssignmentManager: Sent CLOSE to 
> gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794 for 
> region hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.
> {noformat}
> 4. The master then tried to assign the namespace region to a region server, 
> and in the process ABORTED:
> From master log:
> {noformat}
> 2013-10-12 10:52:36,025 DEBUG [RpcServer.handler=4,port=6] 
> master.AssignmentManager: No previous transition plan found (or ignoring an 
> existing plan) for 
> hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.; generated 
> random 
> plan=hri=hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a., 
> src=, 
> dest=gs-hdp2-secure-1381559462-hbase-9.cs1cloud.internal,60020,1381564439807; 
> 4 (online=4, available=4) available servers, forceNewPlan=true
> 2013-10-12 10:52:36,026 FATAL [RpcServer.handler=4,port=6] 
> master.HMaster: Master server abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController]
> 2013-10-12 10:52:36,027 FATAL [RpcServer.handler=4,port=6] 
> master.HMaster: Unexpected state : {a0ac0825ba2d0830614e7f808f31787a 
> state=OPEN, ts=1381564451344, 
> server=gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794}
>  .. Cannot transit it to OFFLINE.
> java.lang.IllegalStateException: Unexpected state : 
> {a0ac0825ba2d0830614e7f808f31787a state=OPEN, ts=1381564451344, 
> server=gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794}
>  .. Cannot transit it to OFFLINE.
> {noformat}
> {code}AssignmentManager.assign(HRegionInfo region, boolean setOfflineInZK, 
> boolean forceNewPlan){code} is the method that does all the above. This was 
> called from the HMaster with true for both the boolean arguments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Reopened] (HBASE-9773) Master aborted when hbck asked the master to assign a region that was already online

2013-10-16 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong reopened HBASE-9773:
--


> Master aborted when hbck asked the master to assign a region that was already 
> online
> 
>
> Key: HBASE-9773
> URL: https://issues.apache.org/jira/browse/HBASE-9773
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Jimmy Xiang
> Fix For: 0.98.0, 0.96.1
>
> Attachments: trunk-9773.patch, trunk-9773_v2.patch
>
>
> Came across this situation (with a version of 0.96 very close to RC5 version 
> created on 10/11):
> The sequence of events that happened:
> 1. The hbck tool couldn't communicate with the RegionServer hosting namespace 
> region due to some security exceptions. hbck INCORRECTLY assumed the region 
> was not deployed.
> In output.log (client side):
> {noformat}
> 2013-10-12 10:42:57,067|beaver.machine|INFO|ERROR: Region { meta => 
> hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a., hdfs => 
> hdfs://gs-hdp2-secure-1381559462-hbase-12.cs1cloud.internal:8020/apps/hbase/data/data/hbase/namespace/a0ac0825ba2d0830614e7f808f31787a,
>  deployed =>  } not deployed on any region server.
> 2013-10-12 10:42:57,067|beaver.machine|INFO|Trying to fix unassigned region...
> {noformat}
> 2. This led to the hbck tool trying to tell the master to "assign" the region.
> In master log (hbase-hbase-master-gs-hdp2-secure-1381559462-hbase-12.log):
> {noformat}
> 2013-10-12 10:52:35,960 INFO  [RpcServer.handler=4,port=6] 
> master.HMaster: Client=hbase//172.18.145.105 assign 
> hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.
> {noformat}
> 3. The master went through the steps - sent a CLOSE to the RegionServer 
> hosting namespace region.
> From master log:
> {noformat}
> 2013-10-12 10:52:35,981 DEBUG [RpcServer.handler=4,port=6] 
> master.AssignmentManager: Sent CLOSE to 
> gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794 for 
> region hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.
> {noformat}
> 4. The master then tried to assign the namespace region to a region server, 
> and in the process ABORTED:
> From master log:
> {noformat}
> 2013-10-12 10:52:36,025 DEBUG [RpcServer.handler=4,port=6] 
> master.AssignmentManager: No previous transition plan found (or ignoring an 
> existing plan) for 
> hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.; generated 
> random 
> plan=hri=hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a., 
> src=, 
> dest=gs-hdp2-secure-1381559462-hbase-9.cs1cloud.internal,60020,1381564439807; 
> 4 (online=4, available=4) available servers, forceNewPlan=true
> 2013-10-12 10:52:36,026 FATAL [RpcServer.handler=4,port=6] 
> master.HMaster: Master server abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController]
> 2013-10-12 10:52:36,027 FATAL [RpcServer.handler=4,port=6] 
> master.HMaster: Unexpected state : {a0ac0825ba2d0830614e7f808f31787a 
> state=OPEN, ts=1381564451344, 
> server=gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794}
>  .. Cannot transit it to OFFLINE.
> java.lang.IllegalStateException: Unexpected state : 
> {a0ac0825ba2d0830614e7f808f31787a state=OPEN, ts=1381564451344, 
> server=gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794}
>  .. Cannot transit it to OFFLINE.
> {noformat}
> {code}AssignmentManager.assign(HRegionInfo region, boolean setOfflineInZK, 
> boolean forceNewPlan){code} is the method that does all the above. This was 
> called from the HMaster with true for both the boolean arguments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9773) Master aborted when hbck asked the master to assign a region that was already online

2013-10-16 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797498#comment-13797498
 ] 

Jeffrey Zhong commented on HBASE-9773:
--

[~jxiang] I checked your addendum. You use recursive way to wait for a region 
is fully closed. There is one problem on that: The caller will be blocked for 
quite a while. If the call request is handled by a thread pool, we could block 
the thread pool. In addition, we may run out stack if for whatever reason we 
have a bad RS.

I think it's better to create a runnable and resubmit the runnable to wait for 
the region closed and then re-assign it.

> Master aborted when hbck asked the master to assign a region that was already 
> online
> 
>
> Key: HBASE-9773
> URL: https://issues.apache.org/jira/browse/HBASE-9773
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Jimmy Xiang
> Fix For: 0.98.0, 0.96.1
>
> Attachments: trunk-9773.addendum, trunk-9773.patch, 
> trunk-9773_v2.patch
>
>
> Came across this situation (with a version of 0.96 very close to RC5 version 
> created on 10/11):
> The sequence of events that happened:
> 1. The hbck tool couldn't communicate with the RegionServer hosting namespace 
> region due to some security exceptions. hbck INCORRECTLY assumed the region 
> was not deployed.
> In output.log (client side):
> {noformat}
> 2013-10-12 10:42:57,067|beaver.machine|INFO|ERROR: Region { meta => 
> hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a., hdfs => 
> hdfs://gs-hdp2-secure-1381559462-hbase-12.cs1cloud.internal:8020/apps/hbase/data/data/hbase/namespace/a0ac0825ba2d0830614e7f808f31787a,
>  deployed =>  } not deployed on any region server.
> 2013-10-12 10:42:57,067|beaver.machine|INFO|Trying to fix unassigned region...
> {noformat}
> 2. This led to the hbck tool trying to tell the master to "assign" the region.
> In master log (hbase-hbase-master-gs-hdp2-secure-1381559462-hbase-12.log):
> {noformat}
> 2013-10-12 10:52:35,960 INFO  [RpcServer.handler=4,port=6] 
> master.HMaster: Client=hbase//172.18.145.105 assign 
> hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.
> {noformat}
> 3. The master went through the steps - sent a CLOSE to the RegionServer 
> hosting namespace region.
> From master log:
> {noformat}
> 2013-10-12 10:52:35,981 DEBUG [RpcServer.handler=4,port=6] 
> master.AssignmentManager: Sent CLOSE to 
> gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794 for 
> region hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.
> {noformat}
> 4. The master then tried to assign the namespace region to a region server, 
> and in the process ABORTED:
> From master log:
> {noformat}
> 2013-10-12 10:52:36,025 DEBUG [RpcServer.handler=4,port=6] 
> master.AssignmentManager: No previous transition plan found (or ignoring an 
> existing plan) for 
> hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.; generated 
> random 
> plan=hri=hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a., 
> src=, 
> dest=gs-hdp2-secure-1381559462-hbase-9.cs1cloud.internal,60020,1381564439807; 
> 4 (online=4, available=4) available servers, forceNewPlan=true
> 2013-10-12 10:52:36,026 FATAL [RpcServer.handler=4,port=6] 
> master.HMaster: Master server abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController]
> 2013-10-12 10:52:36,027 FATAL [RpcServer.handler=4,port=6] 
> master.HMaster: Unexpected state : {a0ac0825ba2d0830614e7f808f31787a 
> state=OPEN, ts=1381564451344, 
> server=gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794}
>  .. Cannot transit it to OFFLINE.
> java.lang.IllegalStateException: Unexpected state : 
> {a0ac0825ba2d0830614e7f808f31787a state=OPEN, ts=1381564451344, 
> server=gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794}
>  .. Cannot transit it to OFFLINE.
> {noformat}
> {code}AssignmentManager.assign(HRegionInfo region, boolean setOfflineInZK, 
> boolean forceNewPlan){code} is the method that does all the above. This was 
> called from the HMaster with true for both the boolean arguments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9773) Master aborted when hbck asked the master to assign a region that was already online

2013-10-16 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797615#comment-13797615
 ] 

Jeffrey Zhong commented on HBASE-9773:
--

With runnable, we can wait a little bit then resubmit it into the executor pool 
so that other runnable can have a chance to run. For example, a thread pool 
have max 5 threads if you firstly submit 5 runnable which are all long blocking 
like the way in the addendum and the following submitted work item won't get 
run until the first 5 completes even though they could complete very quickly. 

> Master aborted when hbck asked the master to assign a region that was already 
> online
> 
>
> Key: HBASE-9773
> URL: https://issues.apache.org/jira/browse/HBASE-9773
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Jimmy Xiang
> Fix For: 0.98.0, 0.96.1
>
> Attachments: trunk-9773.addendum, trunk-9773.patch, 
> trunk-9773_v2.patch
>
>
> Came across this situation (with a version of 0.96 very close to RC5 version 
> created on 10/11):
> The sequence of events that happened:
> 1. The hbck tool couldn't communicate with the RegionServer hosting namespace 
> region due to some security exceptions. hbck INCORRECTLY assumed the region 
> was not deployed.
> In output.log (client side):
> {noformat}
> 2013-10-12 10:42:57,067|beaver.machine|INFO|ERROR: Region { meta => 
> hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a., hdfs => 
> hdfs://gs-hdp2-secure-1381559462-hbase-12.cs1cloud.internal:8020/apps/hbase/data/data/hbase/namespace/a0ac0825ba2d0830614e7f808f31787a,
>  deployed =>  } not deployed on any region server.
> 2013-10-12 10:42:57,067|beaver.machine|INFO|Trying to fix unassigned region...
> {noformat}
> 2. This led to the hbck tool trying to tell the master to "assign" the region.
> In master log (hbase-hbase-master-gs-hdp2-secure-1381559462-hbase-12.log):
> {noformat}
> 2013-10-12 10:52:35,960 INFO  [RpcServer.handler=4,port=6] 
> master.HMaster: Client=hbase//172.18.145.105 assign 
> hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.
> {noformat}
> 3. The master went through the steps - sent a CLOSE to the RegionServer 
> hosting namespace region.
> From master log:
> {noformat}
> 2013-10-12 10:52:35,981 DEBUG [RpcServer.handler=4,port=6] 
> master.AssignmentManager: Sent CLOSE to 
> gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794 for 
> region hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.
> {noformat}
> 4. The master then tried to assign the namespace region to a region server, 
> and in the process ABORTED:
> From master log:
> {noformat}
> 2013-10-12 10:52:36,025 DEBUG [RpcServer.handler=4,port=6] 
> master.AssignmentManager: No previous transition plan found (or ignoring an 
> existing plan) for 
> hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.; generated 
> random 
> plan=hri=hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a., 
> src=, 
> dest=gs-hdp2-secure-1381559462-hbase-9.cs1cloud.internal,60020,1381564439807; 
> 4 (online=4, available=4) available servers, forceNewPlan=true
> 2013-10-12 10:52:36,026 FATAL [RpcServer.handler=4,port=6] 
> master.HMaster: Master server abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController]
> 2013-10-12 10:52:36,027 FATAL [RpcServer.handler=4,port=6] 
> master.HMaster: Unexpected state : {a0ac0825ba2d0830614e7f808f31787a 
> state=OPEN, ts=1381564451344, 
> server=gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794}
>  .. Cannot transit it to OFFLINE.
> java.lang.IllegalStateException: Unexpected state : 
> {a0ac0825ba2d0830614e7f808f31787a state=OPEN, ts=1381564451344, 
> server=gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794}
>  .. Cannot transit it to OFFLINE.
> {noformat}
> {code}AssignmentManager.assign(HRegionInfo region, boolean setOfflineInZK, 
> boolean forceNewPlan){code} is the method that does all the above. This was 
> called from the HMaster with true for both the boolean arguments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9773) Master aborted when hbck asked the master to assign a region that was already online

2013-10-17 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798164#comment-13798164
 ] 

Jeffrey Zhong commented on HBASE-9773:
--

{quote}
I will enhance the addendum a little to have a timeout, just like in the assign 
case. I think we should not use runnable because the open will be retried 
usually. If the unassign is timed out, the pool won't be blocked.
{quote}
Since TimeoutMonitor is disable by default so the open won't retried. While for 
this JIRA issue, if we timeout unassign is a reasonable solution because in SSH 
case the host server is dead already so unassign will bail out quick without 
timeout. For other situations, we give up assignment is acceptable because 
region assignment is more like a hint to the system and master doesn't have to 
assign a region by user requests. 

> Master aborted when hbck asked the master to assign a region that was already 
> online
> 
>
> Key: HBASE-9773
> URL: https://issues.apache.org/jira/browse/HBASE-9773
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Jimmy Xiang
> Fix For: 0.98.0, 0.96.1
>
> Attachments: trunk-9773.addendum, trunk-9773.patch, 
> trunk-9773_v2.patch
>
>
> Came across this situation (with a version of 0.96 very close to RC5 version 
> created on 10/11):
> The sequence of events that happened:
> 1. The hbck tool couldn't communicate with the RegionServer hosting namespace 
> region due to some security exceptions. hbck INCORRECTLY assumed the region 
> was not deployed.
> In output.log (client side):
> {noformat}
> 2013-10-12 10:42:57,067|beaver.machine|INFO|ERROR: Region { meta => 
> hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a., hdfs => 
> hdfs://gs-hdp2-secure-1381559462-hbase-12.cs1cloud.internal:8020/apps/hbase/data/data/hbase/namespace/a0ac0825ba2d0830614e7f808f31787a,
>  deployed =>  } not deployed on any region server.
> 2013-10-12 10:42:57,067|beaver.machine|INFO|Trying to fix unassigned region...
> {noformat}
> 2. This led to the hbck tool trying to tell the master to "assign" the region.
> In master log (hbase-hbase-master-gs-hdp2-secure-1381559462-hbase-12.log):
> {noformat}
> 2013-10-12 10:52:35,960 INFO  [RpcServer.handler=4,port=6] 
> master.HMaster: Client=hbase//172.18.145.105 assign 
> hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.
> {noformat}
> 3. The master went through the steps - sent a CLOSE to the RegionServer 
> hosting namespace region.
> From master log:
> {noformat}
> 2013-10-12 10:52:35,981 DEBUG [RpcServer.handler=4,port=6] 
> master.AssignmentManager: Sent CLOSE to 
> gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794 for 
> region hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.
> {noformat}
> 4. The master then tried to assign the namespace region to a region server, 
> and in the process ABORTED:
> From master log:
> {noformat}
> 2013-10-12 10:52:36,025 DEBUG [RpcServer.handler=4,port=6] 
> master.AssignmentManager: No previous transition plan found (or ignoring an 
> existing plan) for 
> hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a.; generated 
> random 
> plan=hri=hbase:namespace,,1381564449706.a0ac0825ba2d0830614e7f808f31787a., 
> src=, 
> dest=gs-hdp2-secure-1381559462-hbase-9.cs1cloud.internal,60020,1381564439807; 
> 4 (online=4, available=4) available servers, forceNewPlan=true
> 2013-10-12 10:52:36,026 FATAL [RpcServer.handler=4,port=6] 
> master.HMaster: Master server abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.security.access.AccessController]
> 2013-10-12 10:52:36,027 FATAL [RpcServer.handler=4,port=6] 
> master.HMaster: Unexpected state : {a0ac0825ba2d0830614e7f808f31787a 
> state=OPEN, ts=1381564451344, 
> server=gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794}
>  .. Cannot transit it to OFFLINE.
> java.lang.IllegalStateException: Unexpected state : 
> {a0ac0825ba2d0830614e7f808f31787a state=OPEN, ts=1381564451344, 
> server=gs-hdp2-secure-1381559462-hbase-1.cs1cloud.internal,60020,1381564439794}
>  .. Cannot transit it to OFFLINE.
> {noformat}
> {code}AssignmentManager.assign(HRegionInfo region, boolean setOfflineInZK, 
> boolean forceNewPlan){code} is the method that does all the above. This was 
> called from the HMaster with true for both the boolean arguments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9776) Test Load And Verify Fails with TableNotEnabledException

2013-10-17 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798216#comment-13798216
 ] 

Jeffrey Zhong commented on HBASE-9776:
--

Yes, I'll commit soon. Thanks.

> Test Load And Verify Fails with TableNotEnabledException
> 
>
> Key: HBASE-9776
> URL: https://issues.apache.org/jira/browse/HBASE-9776
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
> Attachments: hbase-9776.patch
>
>
> Occasionally IntegrationTestLoadAndVerify failed with the following error. 
> This is caused by RPC retry and the first attempt actually went through 
> successfully and the second retry attempt fails because the table is disabled 
> by the first attempt.
> {code}
> 2013-10-10 
> 19:55:54,339|beaver.machine|INFO|org.apache.hadoop.hbase.TableNotEnabledException:
>  org.apache.hadoop.hbase.TableNotEnabledException: 
> IntegrationTestLoadAndVerify
> 2013-10-10 19:55:54,340|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.handler.DisableTableHandler.prepare(DisableTableHandler.java:100)
> 2013-10-10 19:55:54,341|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1979)
> 2013-10-10 19:55:54,342|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1990)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9793) Offline a region before it's closed could cause double assignment

2013-10-17 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798232#comment-13798232
 ] 

Jeffrey Zhong commented on HBASE-9793:
--

I reviewed the patch. One minor thing is that we should remove the following 
code. The rest looks good for me. It'll be better if you have a successful IT 
run before to check in. Thanks.
{code}
+if (server.isStopped() || server.isAborted()) {
+  LOG.info("Skip assigning " + region.getRegionNameAsString()
++ ", the server is stopped/aborted");
+}
{code}

Because we have in the assign as following. You can add the check  
server.isAborted() though in the loop condition.

{code}
  for (int i = 1; i <= maximumAttempts && !server.isStopped(); i++) {
{code}

> Offline a region before it's closed could cause double assignment
> -
>
> Key: HBASE-9793
> URL: https://issues.apache.org/jira/browse/HBASE-9793
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Attachments: trunk-9793.patch
>
>
> The fix for HBASE-9773 could cause double assignment, as [~jeffreyz] pointed 
> out. Let's fix it in a separate jira instead of an addendum since there are 
> different opinions on how to fix it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9793) Offline a region before it's closed could cause double assignment

2013-10-17 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798261#comment-13798261
 ] 

Jeffrey Zhong commented on HBASE-9793:
--

{quote}
Can we change the loop condition instead so that we can have something in the 
log to find out what's going on?
{quote}
I think that's ok because we already log the info when a server is about to 
abort / stop. Depends on you though.

> Offline a region before it's closed could cause double assignment
> -
>
> Key: HBASE-9793
> URL: https://issues.apache.org/jira/browse/HBASE-9793
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Attachments: trunk-9793.patch
>
>
> The fix for HBASE-9773 could cause double assignment, as [~jeffreyz] pointed 
> out. Let's fix it in a separate jira instead of an addendum since there are 
> different opinions on how to fix it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9776) Test Load And Verify Fails with TableNotEnabledException

2013-10-17 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9776:
-

   Resolution: Fixed
Fix Version/s: 0.96.1
   0.98.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks for the review and comments! I've integrated the patch into 0.96 and 
trunk.

> Test Load And Verify Fails with TableNotEnabledException
> 
>
> Key: HBASE-9776
> URL: https://issues.apache.org/jira/browse/HBASE-9776
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Minor
> Fix For: 0.98.0, 0.96.1
>
> Attachments: hbase-9776.patch
>
>
> Occasionally IntegrationTestLoadAndVerify failed with the following error. 
> This is caused by RPC retry and the first attempt actually went through 
> successfully and the second retry attempt fails because the table is disabled 
> by the first attempt.
> {code}
> 2013-10-10 
> 19:55:54,339|beaver.machine|INFO|org.apache.hadoop.hbase.TableNotEnabledException:
>  org.apache.hadoop.hbase.TableNotEnabledException: 
> IntegrationTestLoadAndVerify
> 2013-10-10 19:55:54,340|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.handler.DisableTableHandler.prepare(DisableTableHandler.java:100)
> 2013-10-10 19:55:54,341|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1979)
> 2013-10-10 19:55:54,342|beaver.machine|INFO|at 
> org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:1990)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9775) Client write path perf issues

2013-10-17 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798685#comment-13798685
 ] 

Jeffrey Zhong commented on HBASE-9775:
--

[~eclark] Have you seen some region servers have much high incoming request 
than others in your test? Thanks.

> Client write path perf issues
> -
>
> Key: HBASE-9775
> URL: https://issues.apache.org/jira/browse/HBASE-9775
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.96.0
>Reporter: Elliott Clark
>Priority: Critical
> Attachments: Charts Search   Cloudera Manager - ITBLL.png, Charts 
> Search   Cloudera Manager.png, job_run.log, short_ycsb.png, 
> ycsb_insert_94_vs_96.png
>
>
> Testing on larger clusters has not had the desired throughput increases.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9775) Client write path perf issues

2013-10-17 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9775:
-

Attachment: hbase-9775.patch

I think I found one bug in the AsyncProcess hurts performance. Below is the 
code snippet:
{code}
  incTaskCounters(multiAction.getRegions(), loc.getServerName());
  Runnable runnable = Trace.wrap("AsyncProcess.sendMultiAction", new 
Runnable() {

receiveMultiAction(initialActions, multiAction, loc, res, 
numAttempt, errorsByServer);
  } finally {
decTaskCounters(multiAction.getRegions(), loc.getServerName());
  }
{code}
Because receiveMultiAction use recursive way to resubmit failure edits. 
Therefore, we double bump up the TaskCounter when error happens and the overlap 
timing is a retry internal which is quite long time for client operations.

I attached a patch for your reference.

> Client write path perf issues
> -
>
> Key: HBASE-9775
> URL: https://issues.apache.org/jira/browse/HBASE-9775
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.96.0
>Reporter: Elliott Clark
>Priority: Critical
> Attachments: Charts Search   Cloudera Manager - ITBLL.png, Charts 
> Search   Cloudera Manager.png, hbase-9775.patch, job_run.log, short_ycsb.png, 
> ycsb_insert_94_vs_96.png
>
>
> Testing on larger clusters has not had the desired throughput increases.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9775) Client write path perf issues

2013-10-18 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799222#comment-13799222
 ] 

Jeffrey Zhong commented on HBASE-9775:
--

[~nkeywal] You're right. It's not real recursive. The task counter double bump 
up time is very short depends on how long the runnable are resubmitted. 
While we still have issues, when there are failures which will stay in the 
retry queue and still use task counters even during the sleep interval for 
retry. In Elliot testis, there exists retry failures so I think that blocked 
some new edits to be sent out. In the posted patch, we only bump up task 
counter during the time when tasks are being sent out and waiting for response.

> Client write path perf issues
> -
>
> Key: HBASE-9775
> URL: https://issues.apache.org/jira/browse/HBASE-9775
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.96.0
>Reporter: Elliott Clark
>Priority: Critical
> Attachments: Charts Search   Cloudera Manager - ITBLL.png, Charts 
> Search   Cloudera Manager.png, hbase-9775.patch, job_run.log, short_ycsb.png, 
> ycsb_insert_94_vs_96.png
>
>
> Testing on larger clusters has not had the desired throughput increases.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9775) Client write path perf issues

2013-10-18 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799705#comment-13799705
 ] 

Jeffrey Zhong commented on HBASE-9775:
--

[~eclark] Is the client config setting "hbase.client.pause" as default 100ms? 
0.96 retry is much more aggressively than 0.94 such as the longest retry 
interval is 1/6 of that of 0.94. If using 200ms seems not overwhelm RS. This 
retries may combined the factor I mentioned above that failure retries still 
take task quota may hurt performance significantly.  

> Client write path perf issues
> -
>
> Key: HBASE-9775
> URL: https://issues.apache.org/jira/browse/HBASE-9775
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.96.0
>Reporter: Elliott Clark
>Priority: Critical
> Attachments: Charts Search   Cloudera Manager - ITBLL.png, Charts 
> Search   Cloudera Manager.png, hbase-9775.patch, job_run.log, short_ycsb.png, 
> ycsb_insert_94_vs_96.png, ycsb.png
>
>
> Testing on larger clusters has not had the desired throughput increases.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-18 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-5487:
-

Attachment: Is the FATE of Assignment Manager FATE.pdf

There are already two good write ups on the topic. Here is yet the another one.

The motivations I post this small draft are that I think we need a systematic 
model for things like table operations, region assignment and others like in 
future so that we can design them in a unified way in order for people easy to 
follow, easy to add test capabilities and easy to reason about the result.

I think FATE provides us those capabilities. We can view FATE as a 
design/system model rather than an cold Accumulo implementation(in the sense we 
can change the implementation to fit HBase cases). 
Under this model, we can simplify region assignment and force feature 
implementer to code in such a way that a partial failed operation can be 
resumed/retied and leave executions to the framework.

The draft is only two pages long(if you know FATE it should only be one page) 
and hope we don't drop FATE as an design option too quickly.


> Generic framework for Master-coordinated tasks
> --
>
> Key: HBASE-5487
> URL: https://issues.apache.org/jira/browse/HBASE-5487
> Project: HBase
>  Issue Type: New Feature
>  Components: master, regionserver, Zookeeper
>Affects Versions: 0.94.0
>Reporter: Mubarak Seyed
>Assignee: Sergey Shelukhin
>Priority: Critical
> Attachments: Entity management in Master - part 1.pdf, 
> hbckMasterV2-long.pdf, Is the FATE of Assignment Manager FATE.pdf, Region 
> management in Master5.docx, Region management in Master.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant 
> manner. 
> Master-coordinated tasks such as online-scheme change and delete-range 
> (deleting region(s) based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
> master-coordinated tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core 
> components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9822) IntegrationTestLazyCfLoading failed occasionally in a secure enviroment

2013-10-22 Thread Jeffrey Zhong (JIRA)
Jeffrey Zhong created HBASE-9822:


 Summary: IntegrationTestLazyCfLoading failed occasionally in a 
secure enviroment
 Key: HBASE-9822
 URL: https://issues.apache.org/jira/browse/HBASE-9822
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.96.0
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
Priority: Trivial


This test case failed in a secure deployment once with the following error. 
It's due to a race condition between writers starts writes and table ACLs 
propagation to region servers.  

{code}
2013-10-14 13:03:32,185 ERROR [HBaseWriterThread_8] 
util.MultiThreadedWriterBase: Failed to insert: 10 after 167ms; region 
information: cached: 
region=IntegrationTestLazyCfLoading,bffd,1381755808862.456a11d22693f7dc27763c32e55521a8.,
 
hostname=gs-hdp2-secure-1381732260-hbase-8.cs1cloud.internal,60020,1381755752694,
 seqNum=1; cache is up to date; errors: exception from 
gs-hdp2-secure-1381732260-hbase-8.cs1cloud.internal:60020 for 
d3d9446802a44259755d38e6d163e820-10
E   org.apache.hadoop.hbase.security.AccessDeniedException: 
org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
permissions (table=IntegrationTestLazyCfLoading, family: essential:filter, 
action=WRITE)

{code}

Writes were sent at 13:03:32,032
{code}
2013-10-14 13:03:32,032 WARN  [htable-pool11-t1] client.AsyncProcess: Attempt 
#1/35 failed for 1 ops on 
gs-hdp2-secure-1381732260-hbase-8.cs1cloud.internal,60020,1381755752694 NOT 
resubmitting.
{code}

While the permission propagation happened at 13:03:32,109 on region server
{code}
2013-10-14 13:03:32,109 DEBUG [regionserver60020-EventThread] 
access.ZKPermissionWatcher: Updating permissions cache from node 
IntegrationTestLazyCfLoading with data: 
PBUF\x0AA\x0A\x06hrt_qa\x127\x08\x03"3\x0A'\x0A\x07default\x12\x1CIntegrationTestLazyCfLoading
 \x00 \x01 \x02 \x03 \x04
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9822) IntegrationTestLazyCfLoading failed occasionally in a secure enviroment

2013-10-22 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9822:
-

Attachment: hbase-9822.patch

There is no good way to know if ACL permissions are received by RSs. So I add a 
sleep in the test case.

> IntegrationTestLazyCfLoading failed occasionally in a secure enviroment
> ---
>
> Key: HBASE-9822
> URL: https://issues.apache.org/jira/browse/HBASE-9822
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Trivial
> Attachments: hbase-9822.patch
>
>
> This test case failed in a secure deployment once with the following error. 
> It's due to a race condition between writers starts writes and table ACLs 
> propagation to region servers.  
> {code}
> 2013-10-14 13:03:32,185 ERROR [HBaseWriterThread_8] 
> util.MultiThreadedWriterBase: Failed to insert: 10 after 167ms; region 
> information: cached: 
> region=IntegrationTestLazyCfLoading,bffd,1381755808862.456a11d22693f7dc27763c32e55521a8.,
>  
> hostname=gs-hdp2-secure-1381732260-hbase-8.cs1cloud.internal,60020,1381755752694,
>  seqNum=1; cache is up to date; errors: exception from 
> gs-hdp2-secure-1381732260-hbase-8.cs1cloud.internal:60020 for 
> d3d9446802a44259755d38e6d163e820-10
> E   org.apache.hadoop.hbase.security.AccessDeniedException: 
> org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
> permissions (table=IntegrationTestLazyCfLoading, family: essential:filter, 
> action=WRITE)
> 
> {code}
> Writes were sent at 13:03:32,032
> {code}
> 2013-10-14 13:03:32,032 WARN  [htable-pool11-t1] client.AsyncProcess: Attempt 
> #1/35 failed for 1 ops on 
> gs-hdp2-secure-1381732260-hbase-8.cs1cloud.internal,60020,1381755752694 NOT 
> resubmitting.
> {code}
> While the permission propagation happened at 13:03:32,109 on region server
> {code}
> 2013-10-14 13:03:32,109 DEBUG [regionserver60020-EventThread] 
> access.ZKPermissionWatcher: Updating permissions cache from node 
> IntegrationTestLazyCfLoading with data: 
> PBUF\x0AA\x0A\x06hrt_qa\x127\x08\x03"3\x0A'\x0A\x07default\x12\x1CIntegrationTestLazyCfLoading
>  \x00 \x01 \x02 \x03 \x04
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9822) IntegrationTestLazyCfLoading failed occasionally in a secure enviroment

2013-10-22 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9822:
-

Status: Patch Available  (was: Open)

> IntegrationTestLazyCfLoading failed occasionally in a secure enviroment
> ---
>
> Key: HBASE-9822
> URL: https://issues.apache.org/jira/browse/HBASE-9822
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Trivial
> Attachments: hbase-9822.patch
>
>
> This test case failed in a secure deployment once with the following error. 
> It's due to a race condition between writers starts writes and table ACLs 
> propagation to region servers.  
> {code}
> 2013-10-14 13:03:32,185 ERROR [HBaseWriterThread_8] 
> util.MultiThreadedWriterBase: Failed to insert: 10 after 167ms; region 
> information: cached: 
> region=IntegrationTestLazyCfLoading,bffd,1381755808862.456a11d22693f7dc27763c32e55521a8.,
>  
> hostname=gs-hdp2-secure-1381732260-hbase-8.cs1cloud.internal,60020,1381755752694,
>  seqNum=1; cache is up to date; errors: exception from 
> gs-hdp2-secure-1381732260-hbase-8.cs1cloud.internal:60020 for 
> d3d9446802a44259755d38e6d163e820-10
> E   org.apache.hadoop.hbase.security.AccessDeniedException: 
> org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
> permissions (table=IntegrationTestLazyCfLoading, family: essential:filter, 
> action=WRITE)
> 
> {code}
> Writes were sent at 13:03:32,032
> {code}
> 2013-10-14 13:03:32,032 WARN  [htable-pool11-t1] client.AsyncProcess: Attempt 
> #1/35 failed for 1 ops on 
> gs-hdp2-secure-1381732260-hbase-8.cs1cloud.internal,60020,1381755752694 NOT 
> resubmitting.
> {code}
> While the permission propagation happened at 13:03:32,109 on region server
> {code}
> 2013-10-14 13:03:32,109 DEBUG [regionserver60020-EventThread] 
> access.ZKPermissionWatcher: Updating permissions cache from node 
> IntegrationTestLazyCfLoading with data: 
> PBUF\x0AA\x0A\x06hrt_qa\x127\x08\x03"3\x0A'\x0A\x07default\x12\x1CIntegrationTestLazyCfLoading
>  \x00 \x01 \x02 \x03 \x04
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8552) fix coverage org.apache.hadoop.hbase.rest.filter

2013-10-28 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807470#comment-13807470
 ] 

Jeffrey Zhong commented on HBASE-8552:
--

Patch looks good to me. +1. If no objection, will commit by end of tomorrow.

[~aklochkov] Could you please recopied the license text portion for both 0.94 
and trunk patch because the format in current patch is different than that in 
other files? Thanks. 

> fix coverage org.apache.hadoop.hbase.rest.filter 
> -
>
> Key: HBASE-8552
> URL: https://issues.apache.org/jira/browse/HBASE-8552
> Project: HBase
>  Issue Type: Test
>Reporter: Aleksey Gorshkov
>Assignee: Andrey Klochkov
> Attachments: HBASE-8552-0.94.patch, HBASE-8552-trunk--n2.patch, 
> HBASE-8552-trunk.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8557) fix coverage org.apache.hadoop.hbase.rest.metrics

2013-10-28 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13807486#comment-13807486
 ] 

Jeffrey Zhong commented on HBASE-8557:
--

+1 on the patch. If no one objects, I'll commit it into 0.94. Thanks.

> fix coverage org.apache.hadoop.hbase.rest.metrics
> -
>
> Key: HBASE-8557
> URL: https://issues.apache.org/jira/browse/HBASE-8557
> Project: HBase
>  Issue Type: Test
>Affects Versions: 0.94.9
>Reporter: Aleksey Gorshkov
> Attachments: HBASE-8557-0.94.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9856) Fix some findbugs Performance Warnings

2013-10-29 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808219#comment-13808219
 ] 

Jeffrey Zhong commented on HBASE-9856:
--

Looks good to me(+1). Thanks.

> Fix some findbugs Performance Warnings
> --
>
> Key: HBASE-9856
> URL: https://issues.apache.org/jira/browse/HBASE-9856
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.98.0
>
> Attachments: 9856-v1.txt
>
>
> These are the warnings to be fixed:
> {code}
> SIC Should org.apache.hadoop.hbase.regionserver.HRegion$RowLock be a _static_ 
> inner class?
> UPM Private method 
> org.apache.hadoop.hbase.security.access.AccessController.requirePermission(String,
>  String, Permission$Action[]) is never called
> WMI Method 
> org.apache.hadoop.hbase.regionserver.wal.WALEditsReplaySink.replayEntries(List)
>  makes inefficient use of keySet iterator instead of entrySet iterator
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8559) increase unit-test coverage of package org.apache.hadoop.hbase.coprocessor

2013-10-29 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13808420#comment-13808420
 ] 

Jeffrey Zhong commented on HBASE-8559:
--

[~iveselovsky] I tried the trunk patch. It seems need to rebase. Thanks.

> increase unit-test coverage of package org.apache.hadoop.hbase.coprocessor
> --
>
> Key: HBASE-8559
> URL: https://issues.apache.org/jira/browse/HBASE-8559
> Project: HBase
>  Issue Type: Test
>Affects Versions: 0.98.0, 0.94.8, 0.95.2
>Reporter: Ivan A. Veselovsky
>Assignee: Ivan A. Veselovsky
> Attachments: HBASE-8559-0.94--N2.patch, HBASE-8559-0.94--N3.patch, 
> HBASE-8559-trunk--N2.patch, HBASE-8559-trunk--N3.patch
>
>
> increase unit-test coverage of package org.apache.hadoop.hbase.coprocessor up 
> to 80%.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-8552) fix coverage org.apache.hadoop.hbase.rest.filter

2013-10-30 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-8552:
-

   Resolution: Fixed
Fix Version/s: 0.94.14
   0.96.1
   0.98.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks [~aklochkov] for the new test and [~te...@apache.org] for reviews! I've 
integrated the patch into 0.94, 0.96 and trunk branches.

> fix coverage org.apache.hadoop.hbase.rest.filter 
> -
>
> Key: HBASE-8552
> URL: https://issues.apache.org/jira/browse/HBASE-8552
> Project: HBase
>  Issue Type: Test
>Reporter: Aleksey Gorshkov
>Assignee: Andrey Klochkov
> Fix For: 0.98.0, 0.96.1, 0.94.14
>
> Attachments: HBASE-8552-0.94.patch, HBASE-8552-trunk--n2.patch, 
> HBASE-8552-trunk--n3.patch, HBASE-8552-trunk.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-8557) fix coverage org.apache.hadoop.hbase.rest.metrics

2013-10-30 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-8557:
-

Assignee: Aleksey Gorshkov

> fix coverage org.apache.hadoop.hbase.rest.metrics
> -
>
> Key: HBASE-8557
> URL: https://issues.apache.org/jira/browse/HBASE-8557
> Project: HBase
>  Issue Type: Test
>Affects Versions: 0.94.9
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: HBASE-8557-0.94.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-8557) fix coverage org.apache.hadoop.hbase.rest.metrics

2013-10-30 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-8557:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Thanks [~aleksgor] for the new test! I've committed it into 0.94 branch.

> fix coverage org.apache.hadoop.hbase.rest.metrics
> -
>
> Key: HBASE-8557
> URL: https://issues.apache.org/jira/browse/HBASE-8557
> Project: HBase
>  Issue Type: Test
>Affects Versions: 0.94.9
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Attachments: HBASE-8557-0.94.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-8557) fix coverage org.apache.hadoop.hbase.rest.metrics

2013-10-30 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-8557:
-

Fix Version/s: 0.94.14

> fix coverage org.apache.hadoop.hbase.rest.metrics
> -
>
> Key: HBASE-8557
> URL: https://issues.apache.org/jira/browse/HBASE-8557
> Project: HBase
>  Issue Type: Test
>Affects Versions: 0.94.9
>Reporter: Aleksey Gorshkov
>Assignee: Aleksey Gorshkov
> Fix For: 0.94.14
>
> Attachments: HBASE-8557-0.94.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9822) IntegrationTestLazyCfLoading failed occasionally in a secure enviroment

2013-10-30 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9822:
-

   Resolution: Fixed
Fix Version/s: 0.96.1
   0.98.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks [~saint@gmail.com] for the review! I've committed the patch into 
0.96 and trunk branch.

> IntegrationTestLazyCfLoading failed occasionally in a secure enviroment
> ---
>
> Key: HBASE-9822
> URL: https://issues.apache.org/jira/browse/HBASE-9822
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Trivial
> Fix For: 0.98.0, 0.96.1
>
> Attachments: hbase-9822.patch
>
>
> This test case failed in a secure deployment once with the following error. 
> It's due to a race condition between writers starts writes and table ACLs 
> propagation to region servers.  
> {code}
> 2013-10-14 13:03:32,185 ERROR [HBaseWriterThread_8] 
> util.MultiThreadedWriterBase: Failed to insert: 10 after 167ms; region 
> information: cached: 
> region=IntegrationTestLazyCfLoading,bffd,1381755808862.456a11d22693f7dc27763c32e55521a8.,
>  
> hostname=gs-hdp2-secure-1381732260-hbase-8.cs1cloud.internal,60020,1381755752694,
>  seqNum=1; cache is up to date; errors: exception from 
> gs-hdp2-secure-1381732260-hbase-8.cs1cloud.internal:60020 for 
> d3d9446802a44259755d38e6d163e820-10
> E   org.apache.hadoop.hbase.security.AccessDeniedException: 
> org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient 
> permissions (table=IntegrationTestLazyCfLoading, family: essential:filter, 
> action=WRITE)
> 
> {code}
> Writes were sent at 13:03:32,032
> {code}
> 2013-10-14 13:03:32,032 WARN  [htable-pool11-t1] client.AsyncProcess: Attempt 
> #1/35 failed for 1 ops on 
> gs-hdp2-secure-1381732260-hbase-8.cs1cloud.internal,60020,1381755752694 NOT 
> resubmitting.
> {code}
> While the permission propagation happened at 13:03:32,109 on region server
> {code}
> 2013-10-14 13:03:32,109 DEBUG [regionserver60020-EventThread] 
> access.ZKPermissionWatcher: Updating permissions cache from node 
> IntegrationTestLazyCfLoading with data: 
> PBUF\x0AA\x0A\x06hrt_qa\x127\x08\x03"3\x0A'\x0A\x07default\x12\x1CIntegrationTestLazyCfLoading
>  \x00 \x01 \x02 \x03 \x04
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

2013-10-30 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809858#comment-13809858
 ] 

Jeffrey Zhong commented on HBASE-9360:
--

I did a prototype @https://github.com/hortonworks/HBaseReplicationBridgeServer 
and tested replication from a 0.94 cluster to a 0.96 cluster.

The remaining work is to bring the slave cluster to a good base(which we're 
currently using CopyTable) when setting up replication.  

h6. Support import from a 0.94 export sequence file. 
Currently we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
cannot import 0.94 exported files while we can easily add that support. 
(personal preferred option)

h6. Use snapshot without any code changes:
1) Setup replication without starting replication bridge server so source 
cluster is queuing WALs
2) Use Snapshot to bring destination cluster to a good base
3) Starts replication bridge servers to drain WALs queued up from step 1

h6. Enable CopyTable against Replication Bridge Server
This option is least desired one because it will involves significant code 
changes to have a faked root znode, support root table scan, support meta table 
scan, delete and multi command in replication bridge server.



> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -
>
> Key: HBASE-9360
> URL: https://issues.apache.org/jira/browse/HBASE-9360
> Project: HBase
>  Issue Type: Brainstorming
>  Components: migration
>Affects Versions: 0.98.0, 0.96.0
>Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9867) Save on array copies with a subclass of LiteralByteString

2013-10-31 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810521#comment-13810521
 ] 

Jeffrey Zhong commented on HBASE-9867:
--

This is a great catch. I was wondering why ByteString construction is so heavy 
a while back.

Just to my curiosities: what's the reason you don't put LiteralByteString 
construction into ProtobufUtil. For the function zeroCopyGetBytes, could we 
pass LiteralByteString as input parameter type because the function doesn't 
support other types anyway? Thanks. Great patch.  

> Save on array copies with a subclass of LiteralByteString
> -
>
> Key: HBASE-9867
> URL: https://issues.apache.org/jira/browse/HBASE-9867
> Project: HBase
>  Issue Type: Improvement
>  Components: Protobufs
>Affects Versions: 0.96.0
>Reporter: stack
>Assignee: stack
> Fix For: 0.98.0, 0.96.1
>
> Attachments: 9867.txt, 9867.txt
>
>
> Any time we add a byte array to a protobuf, it'll copy the byte array.
> I was playing with the client and noticed how a bunch of CPU and copying was 
> being done just to copy basic arrays doing pb construction.  I started to 
> look at ByteString and then remembered a class Benoit sent me a while back 
> that I did not understand from his new AsyncHBase.  After looking in 
> ByteString it made now sense.  So, rather than copy byte arrays everywhere, 
> do a version of a ByteString that instead wraps the array.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9867) Save on array copies with a subclass of LiteralByteString

2013-10-31 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810629#comment-13810629
 ] 

Jeffrey Zhong commented on HBASE-9867:
--

{quote}
LBS is package private to com.google.protobuf
{quote}
I see. Thanks for the clarifications.

> Save on array copies with a subclass of LiteralByteString
> -
>
> Key: HBASE-9867
> URL: https://issues.apache.org/jira/browse/HBASE-9867
> Project: HBase
>  Issue Type: Improvement
>  Components: Protobufs
>Affects Versions: 0.96.0
>Reporter: stack
>Assignee: stack
> Fix For: 0.98.0, 0.96.1
>
> Attachments: 9867.txt, 9867.txt
>
>
> Any time we add a byte array to a protobuf, it'll copy the byte array.
> I was playing with the client and noticed how a bunch of CPU and copying was 
> being done just to copy basic arrays doing pb construction.  I started to 
> look at ByteString and then remembered a class Benoit sent me a while back 
> that I did not understand from his new AsyncHBase.  After looking in 
> ByteString it made now sense.  So, rather than copy byte arrays everywhere, 
> do a version of a ByteString that instead wraps the array.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8559) increase unit-test coverage of package org.apache.hadoop.hbase.coprocessor

2013-10-31 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810801#comment-13810801
 ] 

Jeffrey Zhong commented on HBASE-8559:
--

The trunk patch looks good to me.(+1)

> increase unit-test coverage of package org.apache.hadoop.hbase.coprocessor
> --
>
> Key: HBASE-8559
> URL: https://issues.apache.org/jira/browse/HBASE-8559
> Project: HBase
>  Issue Type: Test
>Affects Versions: 0.98.0, 0.94.8, 0.95.2
>Reporter: Ivan A. Veselovsky
>Assignee: Ivan A. Veselovsky
> Attachments: HBASE-8559-0.94--N2.patch, HBASE-8559-0.94--N3.patch, 
> HBASE-8559-trunk--N2.patch, HBASE-8559-trunk--N3.patch, 
> HBASE-8559-trunk--N4.patch
>
>
> increase unit-test coverage of package org.apache.hadoop.hbase.coprocessor up 
> to 80%.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9873) Some improvements in hlog and hlog split

2013-11-01 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811575#comment-13811575
 ] 

Jeffrey Zhong commented on HBASE-9873:
--

{quote}
Support running multiple hlog splitters on a single RS
{quote}
We can make this configurable(HBase-9736). In many cases, recovery is happening 
while a cluster is serving live traffic so you normally don't want recovery 
traffic to affect other live traffic too much. Making the number of log 
splitter configurable normally helps when the cluster has free IO capacity(SSD 
clusters) or in distributedLogReplay mode where no extra small random writes 
from recovery.edits operations. 

When a WAL splitting takes about 30+ seconds, I guess opening more splitters 
may have counter effect because log splitting normally slows at writing side 
and reader idles to wait for writing to finish. Opening more splitter basically 
add more write load in the cluster so it could even drag current split task.

{quote}
Try to clean old hlog after each memstore flush to avoid unnecessary hlogs 
split in failover. Now hlogs cleaning only be run in rolling hlog writer.
{quote}
I have a different idea in this area: we could be smart on the log cleaning 
such as we can maintain last flushed sequence number of each region and regions 
for each wal in memory so a log cleaner can out of order clean a wal instead of 
checking global smallest flushed sequence number.

{quote}
5) Enable multiple splitters on 'big' hlog file by splitting(logically) hlog to 
slices(configurable size, eg hdfs trunk size 64M)
{quote}
I'd wait for our multiple wal solution. Because it basically assumes we have 
the IO capacity but less worker slots while with the multiple splitter per RS 
and limiting wal size, the suggestion seems not needed.

{quote}
7) Consider the hlog data locality when schedule the hlog split task. Schedule 
the hlog to a splitter which is near to hlog data.
{quote}
We have a JIRA HBASE-6772 on this.

In general, RS failure recovery spends huge percentage in detection time. It'd 
be better if we can look into that as well.  Thanks.





> Some improvements in hlog and hlog split
> 
>
> Key: HBASE-9873
> URL: https://issues.apache.org/jira/browse/HBASE-9873
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR, wal
>Reporter: Liu Shaohui
>Priority: Critical
>  Labels: failover, hlog
>
> Some improvements in hlog and hlog split
> 1) Try to clean old hlog after each memstore flush to avoid unnecessary hlogs 
> split in failover.  Now hlogs cleaning only be run in rolling hlog writer. 
> 2) Add a background hlog compaction thread to compaction the hlog: remove the 
> hlog entries whose data have been flushed to hfile. The scenario is that in a 
> share cluster, write requests of a table may very little and periodical,  a 
> lots of hlogs can not be cleaned for entries of this table in those hlogs.
> 3) Rely on the smallest of all biggest hfile's seqId of previous served 
> regions to ignore some entries.  Facebook have implemented this in HBASE-6508 
> and we backport it to hbase 0.94 in HBASE-9568.
> 4) Support running multiple hlog splitters on a single RS and on 
> master(latter can boost split efficiency for tiny cluster)
> 5) Enable multiple splitters on 'big' hlog file by splitting(logically) hlog 
> to slices(configurable size, eg hdfs trunk size 64M)
> support concurrent multiple split tasks on a single hlog file slice 
> 6) Do not cancel the timeout split task until one task reports it succeeds 
> (avoids scenario where split for a hlog file fails due to no one task can 
> succeed within the timeout period ), and and reschedule a same split task to 
> reduce split time ( to avoid some straggler in hlog split)
> 7) Consider the hlog data locality when schedule the hlog split task.  
> Schedule the hlog to a splitter which is near to hlog data.
> 8) Support multi hlog writers and switching to another hlog writer when long 
> write latency to current hlog due to possible temporary network spike? 
> This is a draft which lists the improvements about hlog we try to implement 
> in the near future. Comments and discussions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9873) Some improvements in hlog and hlog split

2013-11-01 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811663#comment-13811663
 ] 

Jeffrey Zhong commented on HBASE-9873:
--

{quote}
In general, RS failure recovery spends huge percentage in detection time.
Sorry, which detection phase we are talking about here?
{quote}
Because an end-to-end RS failure includes: failed region server detection and 
recovery. Currently ZK session time out is 30 secs(by default) to detect a RS 
failure, if we can detect a failure much earlier. For example, if all clients 
have issues to talk to a RS, a master can close the ZK session of RS and then 
immediately trigger recovery without waiting for ZK session time out.

{quote}
Currently, a split log worker just picks log files randomly. I think it would 
be better to pick older files first, and pick the latest (which most probably, 
is incomplete) file last. 
{quote}
In other threads, [~nkeywal] and I have an idea to send recover lease calls 
async to all log files and then put them in ZK one by one after recoverlease is 
done. Therefore, we don't need to enforce pick order in SplitLogWorker.





> Some improvements in hlog and hlog split
> 
>
> Key: HBASE-9873
> URL: https://issues.apache.org/jira/browse/HBASE-9873
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR, wal
>Reporter: Liu Shaohui
>Priority: Critical
>  Labels: failover, hlog
>
> Some improvements in hlog and hlog split
> 1) Try to clean old hlog after each memstore flush to avoid unnecessary hlogs 
> split in failover.  Now hlogs cleaning only be run in rolling hlog writer. 
> 2) Add a background hlog compaction thread to compaction the hlog: remove the 
> hlog entries whose data have been flushed to hfile. The scenario is that in a 
> share cluster, write requests of a table may very little and periodical,  a 
> lots of hlogs can not be cleaned for entries of this table in those hlogs.
> 3) Rely on the smallest of all biggest hfile's seqId of previous served 
> regions to ignore some entries.  Facebook have implemented this in HBASE-6508 
> and we backport it to hbase 0.94 in HBASE-9568.
> 4) Support running multiple hlog splitters on a single RS and on 
> master(latter can boost split efficiency for tiny cluster)
> 5) Enable multiple splitters on 'big' hlog file by splitting(logically) hlog 
> to slices(configurable size, eg hdfs trunk size 64M)
> support concurrent multiple split tasks on a single hlog file slice 
> 6) Do not cancel the timeout split task until one task reports it succeeds 
> (avoids scenario where split for a hlog file fails due to no one task can 
> succeed within the timeout period ), and and reschedule a same split task to 
> reduce split time ( to avoid some straggler in hlog split)
> 7) Consider the hlog data locality when schedule the hlog split task.  
> Schedule the hlog to a splitter which is near to hlog data.
> 8) Support multi hlog writers and switching to another hlog writer when long 
> write latency to current hlog due to possible temporary network spike? 
> This is a draft which lists the improvements about hlog we try to implement 
> in the near future. Comments and discussions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9865) WALEdit.heapSize() is incorrect in certain replication scenarios which may cause RegionServers to go OOM

2013-11-04 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813129#comment-13813129
 ] 

Jeffrey Zhong commented on HBASE-9865:
--

[~lhofhansl] The following code is a dead code path and should never be called 
in current implementation. The handling here is confusing enough though.
{code}
else if (currentNbEntries != 0) {
...
considerDumping = true;
currentNbEntries = 0;
  }
{code}


> WALEdit.heapSize() is incorrect in certain replication scenarios which may 
> cause RegionServers to go OOM
> 
>
> Key: HBASE-9865
> URL: https://issues.apache.org/jira/browse/HBASE-9865
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.5, 0.95.0
>Reporter: churro morales
>Assignee: Lars Hofhansl
> Attachments: 9865-0.94-v2.txt, 9865-sample-1.txt, 9865-sample.txt, 
> 9865-trunk-v2.txt, 9865-trunk.txt
>
>
> WALEdit.heapSize() is incorrect in certain replication scenarios which may 
> cause RegionServers to go OOM.
> A little background on this issue.  We noticed that our source replication 
> regionservers would get into gc storms and sometimes even OOM. 
> We noticed a case where it showed that there were around 25k WALEdits to 
> replicate, each one with an ArrayList of KeyValues.  The array list had a 
> capacity of around 90k (using 350KB of heap memory) but had around 6 non null 
> entries.
> When the ReplicationSource.readAllEntriesToReplicateOrNextFile() gets a 
> WALEdit it removes all kv's that are scoped other than local.  
> But in doing so we don't account for the capacity of the ArrayList when 
> determining heapSize for a WALEdit.  The logic for shipping a batch is 
> whether you have hit a size capacity or number of entries capacity.  
> Therefore if have a WALEdit with 25k entries and suppose all are removed: 
> The size of the arrayList is 0 (we don't even count the collection's heap 
> size currently) but the capacity is ignored.
> This will yield a heapSize() of 0 bytes while in the best case it would be at 
> least 10 bytes (provided you pass initialCapacity and you have 32 bit 
> JVM) 
> I have some ideas on how to address this problem and want to know everyone's 
> thoughts:
> 1. We use a probabalistic counter such as HyperLogLog and create something 
> like:
>   * class CapacityEstimateArrayList implements ArrayList
>   ** this class overrides all additive methods to update the 
> probabalistic counts
>   ** it includes one additional method called estimateCapacity 
> (we would take estimateCapacity - size() and fill in sizes for all references)
>   * Then we can do something like this in WALEdit.heapSize:
>   
> {code}
>   public long heapSize() {
> long ret = ClassSize.ARRAYLIST;
> for (KeyValue kv : kvs) {
>   ret += kv.heapSize();
> }
> long nullEntriesEstimate = kvs.getCapacityEstimate() - kvs.size();
> ret += ClassSize.align(nullEntriesEstimate * ClassSize.REFERENCE);
> if (scopes != null) {
>   ret += ClassSize.TREEMAP;
>   ret += ClassSize.align(scopes.size() * ClassSize.MAP_ENTRY);
>   // TODO this isn't quite right, need help here
> }
> return ret;
>   }   
> {code}
> 2. In ReplicationSource.removeNonReplicableEdits() we know the size of the 
> array originally, and we provide some percentage threshold.  When that 
> threshold is met (50% of the entries have been removed) we can call 
> kvs.trimToSize()
> 3. in the heapSize() method for WALEdit we could use reflection (Please don't 
> shoot me for this) to grab the actual capacity of the list.  Doing something 
> like this:
> {code}
> public int getArrayListCapacity()  {
> try {
>   Field f = ArrayList.class.getDeclaredField("elementData");
>   f.setAccessible(true);
>   return ((Object[]) f.get(kvs)).length;
> } catch (Exception e) {
>   log.warn("Exception in trying to get capacity on ArrayList", e);
>   return kvs.size();
> }
> {code}
> I am partial to (1) using HyperLogLog and creating a 
> CapacityEstimateArrayList, this is reusable throughout the code for other 
> classes that implement HeapSize which contains ArrayLists.  The memory 
> footprint is very small and it is very fast.  The issue is that this is an 
> estimate, although we can configure the precision we most likely always be 
> conservative.  The estimateCapacity will always be less than the 
> actualCapacity, but it will be close. I think that putting the logic in 
> removeNonReplicableEdits will work, but this only solves the heapSize problem 
> in this particular scenario.  Solution 3 is slow and horrible but that gives 
> us the exact answer.
> I would love to hear if anyone else has any oth

[jira] [Created] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-05 Thread Jeffrey Zhong (JIRA)
Jeffrey Zhong created HBASE-9895:


 Summary: 0.96 Import utility can't import an exported file from 
0.94
 Key: HBASE-9895
 URL: https://issues.apache.org/jira/browse/HBASE-9895
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.96.0
Reporter: Jeffrey Zhong


Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
cannot import 0.94 exported files. This issue is annoying because a user can't 
import his old archive files after upgrade or archives from others who are 
using 0.94.

The ideal way is to catch deserialization error and then fall back to 0.94 
format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-8018) Add "Flaky Testcase Detector" tool into dev-tools

2013-11-05 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-8018:
-

Description: 
jenkins-tools
=

A tool which pulls test case results from Jenkins server. It displays a union 
of failed test cases 
from the last 15(by default and actual number of jobs can be less depending on 
availablity) runs 
recorded in Jenkins sever and track how each of them are performed for all the 
last 15 runs(passed, 
not run or failed)

*Pre-requirement(run under folder ./dev-support/jenkins-tools)*
   Please download jenkins-client from 
https://github.com/cosmin/jenkins-client
   1) git clone git://github.com/cosmin/jenkins-client.git
   2) make sure the dependency jenkins-client version in 
./buildstats/pom.xml matches the 
  downloaded jenkins-client(current value is 0.1.6-SNAPSHOT)
   
Build command(run under folder jenkins-tools):
{code}
   mvn clean package
{code}
Usage are: 
{code}
   java -jar ./buildstats/target/buildstats.jar   [number of last most recent jobs to check]
{code}
Sample commands are:
{code}
   java -jar ./buildstats/target/buildstats.jar https://builds.apache.org 
HBase-TRUNK
{code}
Sample output(where 1 means "PASSED", 0 means "NOT RUN AT ALL", -1 means 
"FAILED"):

Failed Test Cases Stats4360 4361 4362 4363 4364 4365 4366 4367 4368 4369

org.apache.hadoop.hbase.backup.testhfilearchiving.testcleaningrace11
111111   -10
org.apache.hadoop.hbase.migration.testnamespaceupgrade.testrenameusingsnapshots 
   111   -1011111

Skipped Test Cases Stats
=== 4360 skipped(Or don't have) following test suites ===
org.apache.hadoop.hbase.replication.testreplicationkillmasterrscompressed
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfilessplitrecovery
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfiles
org.apache.hadoop.hbase.mapreduce.testmapreduceexamples
=== 4361 skipped(Or don't have) following test suites ===
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfilessplitrecovery
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfiles
org.apache.hadoop.hbase.mapreduce.testmapreduceexamples
=== 4362 skipped(Or don't have) following test suites ===
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfilessplitrecovery
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfiles
org.apache.hadoop.hbase.mapreduce.testmapreduceexamples
=== 4363 skipped(Or don't have) following test suites ===
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfilessplitrecovery
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfiles
org.apache.hadoop.hbase.mapreduce.testmapreduceexamples
=== 4368 skipped(Or don't have) following test suites ===
org.apache.hadoop.hbase.client.testadmin
org.apache.hadoop.hbase.client.testclonesnapshotfromclient
org.apache.hadoop.hbase.mapreduce.testmapreduceexamples

  was:
jenkins-tools
=

A tool which pulls test case results from Jenkins server. It displays a union 
of failed test cases 
from the last 15(by default and actual number of jobs can be less depending on 
availablity) runs 
recorded in Jenkins sever and track how each of them are performed for all the 
last 15 runs(passed, 
not run or failed)

*Pre-requirement(run under folder jenkins-tools)*
   Please download jenkins-client from 
https://github.com/cosmin/jenkins-client
   1) git clone git://github.com/cosmin/jenkins-client.git
   2) make sure the dependency jenkins-client version in 
./buildstats/pom.xml matches the 
  downloaded jenkins-client(current value is 0.1.6-SNAPSHOT)
   
Build command(run under folder jenkins-tools):
{code}
   mvn clean package
{code}
Usage are: 
{code}
   java -jar ./buildstats/target/buildstats.jar   [number of last most recent jobs to check]
{code}
Sample commands are:
{code}
   java -jar ./buildstats/target/buildstats.jar https://builds.apache.org 
HBase-TRUNK
{code}
Sample output(where 1 means "PASSED", 0 means "NOT RUN AT ALL", -1 means 
"FAILED"):

Failed Test Cases Stats4360 4361 4362 4363 4364 4365 4366 4367 4368 4369

org.apache.hadoop.hbase.backup.testhfilearchiving.testcleaningrace11
111111   -10
org.apache.hadoop.hbase.migration.testnamespaceupgrade.testrenameusingsnapshots 
   111   -1011111

Skipped Test Cases Stats
=== 4360 skipped(Or don't have) following test suites ===
org.apache.hadoop.hbase.replication.testreplicationkillmasterrscompressed
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfilessplitrecovery
org.apache.hadoop.hbase.mapreduce.testsecureloadincrementalhfiles
o

[jira] [Assigned] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-06 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong reassigned HBASE-9895:


Assignee: Jeffrey Zhong

> 0.96 Import utility can't import an exported file from 0.94
> ---
>
> Key: HBASE-9895
> URL: https://issues.apache.org/jira/browse/HBASE-9895
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>
> Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
> cannot import 0.94 exported files. This issue is annoying because a user 
> can't import his old archive files after upgrade or archives from others who 
> are using 0.94.
> The ideal way is to catch deserialization error and then fall back to 0.94 
> format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-06 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9895:
-

Attachment: hbase-9895.patch

No good way to dynamically determine an input file format in 0.94 so 
introducing a system property such as following in order for Import to load a 
file using 0.94 deserializer.

{code}
./bin/hbase -Dhbase.input.version=0.94 org.apache.hadoop.hbase.mapreduce.Import 
 
{code}

> 0.96 Import utility can't import an exported file from 0.94
> ---
>
> Key: HBASE-9895
> URL: https://issues.apache.org/jira/browse/HBASE-9895
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: hbase-9895.patch
>
>
> Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
> cannot import 0.94 exported files. This issue is annoying because a user 
> can't import his old archive files after upgrade or archives from others who 
> are using 0.94.
> The ideal way is to catch deserialization error and then fall back to 0.94 
> format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-06 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9895:
-

Status: Patch Available  (was: Open)

> 0.96 Import utility can't import an exported file from 0.94
> ---
>
> Key: HBASE-9895
> URL: https://issues.apache.org/jira/browse/HBASE-9895
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: hbase-9895.patch
>
>
> Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
> cannot import 0.94 exported files. This issue is annoying because a user 
> can't import his old archive files after upgrade or archives from others who 
> are using 0.94.
> The ideal way is to catch deserialization error and then fall back to 0.94 
> format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9918) MasterAddressTracker & ZKNamespaceManager ZK listeners are missed after master recovery

2013-11-07 Thread Jeffrey Zhong (JIRA)
Jeffrey Zhong created HBASE-9918:


 Summary: MasterAddressTracker & ZKNamespaceManager ZK listeners 
are missed after master recovery
 Key: HBASE-9918
 URL: https://issues.apache.org/jira/browse/HBASE-9918
 Project: HBase
  Issue Type: Bug
Reporter: Jeffrey Zhong


TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry always 
failed at the following verification for me in my dev env(you have to run the 
single test  not the whole TestZooKeeper suite to reproduce)
{code}
assertEquals("Number of rows should be equal to number of puts.", numberOfPuts, 
numberOfRows);
{code}

We missed two ZK listeners after master recovery MasterAddressTracker & 
ZKNamespaceManager. 

My current patch is to fix the JIRA issue while I'm wondering if we should 
totally remove the master failover implementation when ZK session expired 
because this causes reinitialize HMaster partially which is error prone and not 
a clean state to start from. 

 






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9918) MasterAddressTracker & ZKNamespaceManager ZK listeners are missed after master recovery

2013-11-07 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9918:
-

Attachment: HBase-9918.patch

This patch also refactors TestZooKeeper test suite so that each individual test 
case inside won't affect others.

> MasterAddressTracker & ZKNamespaceManager ZK listeners are missed after 
> master recovery
> ---
>
> Key: HBASE-9918
> URL: https://issues.apache.org/jira/browse/HBASE-9918
> Project: HBase
>  Issue Type: Bug
>Reporter: Jeffrey Zhong
> Attachments: HBase-9918.patch
>
>
> TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry always 
> failed at the following verification for me in my dev env(you have to run the 
> single test  not the whole TestZooKeeper suite to reproduce)
> {code}
> assertEquals("Number of rows should be equal to number of puts.", 
> numberOfPuts, numberOfRows);
> {code}
> We missed two ZK listeners after master recovery MasterAddressTracker & 
> ZKNamespaceManager. 
> My current patch is to fix the JIRA issue while I'm wondering if we should 
> totally remove the master failover implementation when ZK session expired 
> because this causes reinitialize HMaster partially which is error prone and 
> not a clean state to start from. 
>  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9918) MasterAddressTracker & ZKNamespaceManager ZK listeners are missed after master recovery

2013-11-07 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816916#comment-13816916
 ] 

Jeffrey Zhong commented on HBASE-9918:
--

[~toffer] Is it all right to call initNamespace() always in master 
initialization even in master failover case? Thanks.

> MasterAddressTracker & ZKNamespaceManager ZK listeners are missed after 
> master recovery
> ---
>
> Key: HBASE-9918
> URL: https://issues.apache.org/jira/browse/HBASE-9918
> Project: HBase
>  Issue Type: Bug
>Reporter: Jeffrey Zhong
> Attachments: HBase-9918.patch
>
>
> TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry always 
> failed at the following verification for me in my dev env(you have to run the 
> single test  not the whole TestZooKeeper suite to reproduce)
> {code}
> assertEquals("Number of rows should be equal to number of puts.", 
> numberOfPuts, numberOfRows);
> {code}
> We missed two ZK listeners after master recovery MasterAddressTracker & 
> ZKNamespaceManager. 
> My current patch is to fix the JIRA issue while I'm wondering if we should 
> totally remove the master failover implementation when ZK session expired 
> because this causes reinitialize HMaster partially which is error prone and 
> not a clean state to start from. 
>  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-07 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9895:
-

Attachment: hbase-9895.patch

Re-attach because the QA run errors seems not related to this patch.

> 0.96 Import utility can't import an exported file from 0.94
> ---
>
> Key: HBASE-9895
> URL: https://issues.apache.org/jira/browse/HBASE-9895
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: hbase-9895.patch
>
>
> Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
> cannot import 0.94 exported files. This issue is annoying because a user 
> can't import his old archive files after upgrade or archives from others who 
> are using 0.94.
> The ideal way is to catch deserialization error and then fall back to 0.94 
> format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-07 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9895:
-

Attachment: (was: hbase-9895.patch)

> 0.96 Import utility can't import an exported file from 0.94
> ---
>
> Key: HBASE-9895
> URL: https://issues.apache.org/jira/browse/HBASE-9895
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: hbase-9895.patch
>
>
> Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
> cannot import 0.94 exported files. This issue is annoying because a user 
> can't import his old archive files after upgrade or archives from others who 
> are using 0.94.
> The ideal way is to catch deserialization error and then fall back to 0.94 
> format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9918) MasterAddressTracker & ZKNamespaceManager ZK listeners are missed after master recovery

2013-11-08 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9918:
-

Status: Patch Available  (was: Open)

> MasterAddressTracker & ZKNamespaceManager ZK listeners are missed after 
> master recovery
> ---
>
> Key: HBASE-9918
> URL: https://issues.apache.org/jira/browse/HBASE-9918
> Project: HBase
>  Issue Type: Bug
>Reporter: Jeffrey Zhong
> Attachments: HBase-9918.patch
>
>
> TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry always 
> failed at the following verification for me in my dev env(you have to run the 
> single test  not the whole TestZooKeeper suite to reproduce)
> {code}
> assertEquals("Number of rows should be equal to number of puts.", 
> numberOfPuts, numberOfRows);
> {code}
> We missed two ZK listeners after master recovery MasterAddressTracker & 
> ZKNamespaceManager. 
> My current patch is to fix the JIRA issue while I'm wondering if we should 
> totally remove the master failover implementation when ZK session expired 
> because this causes reinitialize HMaster partially which is error prone and 
> not a clean state to start from. 
>  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (HBASE-9918) MasterAddressTracker & ZKNamespaceManager ZK listeners are missed after master recovery

2013-11-08 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong reassigned HBASE-9918:


Assignee: Jeffrey Zhong

> MasterAddressTracker & ZKNamespaceManager ZK listeners are missed after 
> master recovery
> ---
>
> Key: HBASE-9918
> URL: https://issues.apache.org/jira/browse/HBASE-9918
> Project: HBase
>  Issue Type: Bug
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: HBase-9918.patch
>
>
> TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry always 
> failed at the following verification for me in my dev env(you have to run the 
> single test  not the whole TestZooKeeper suite to reproduce)
> {code}
> assertEquals("Number of rows should be equal to number of puts.", 
> numberOfPuts, numberOfRows);
> {code}
> We missed two ZK listeners after master recovery MasterAddressTracker & 
> ZKNamespaceManager. 
> My current patch is to fix the JIRA issue while I'm wondering if we should 
> totally remove the master failover implementation when ZK session expired 
> because this causes reinitialize HMaster partially which is error prone and 
> not a clean state to start from. 
>  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-08 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817580#comment-13817580
 ] 

Jeffrey Zhong commented on HBASE-9895:
--

class ResultSerialization is extended from Configured. Therefore, when 
mapreduce initializes those class and configuration will be passed to the new 
instance automatically(magically). 

> 0.96 Import utility can't import an exported file from 0.94
> ---
>
> Key: HBASE-9895
> URL: https://issues.apache.org/jira/browse/HBASE-9895
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: hbase-9895.patch
>
>
> Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
> cannot import 0.94 exported files. This issue is annoying because a user 
> can't import his old archive files after upgrade or archives from others who 
> are using 0.94.
> The ideal way is to catch deserialization error and then fall back to 0.94 
> format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9918) MasterAddressTracker & ZKNamespaceManager ZK listeners are missed after master recovery

2013-11-08 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9918:
-

Attachment: hbase-9918.v2.patch

Thanks [~te...@apache.org] for the good catch! I've incorporated your comments 
into v2 patch.

> MasterAddressTracker & ZKNamespaceManager ZK listeners are missed after 
> master recovery
> ---
>
> Key: HBASE-9918
> URL: https://issues.apache.org/jira/browse/HBASE-9918
> Project: HBase
>  Issue Type: Bug
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: HBase-9918.patch, hbase-9918.v2.patch
>
>
> TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry always 
> failed at the following verification for me in my dev env(you have to run the 
> single test  not the whole TestZooKeeper suite to reproduce)
> {code}
> assertEquals("Number of rows should be equal to number of puts.", 
> numberOfPuts, numberOfRows);
> {code}
> We missed two ZK listeners after master recovery MasterAddressTracker & 
> ZKNamespaceManager. 
> My current patch is to fix the JIRA issue while I'm wondering if we should 
> totally remove the master failover implementation when ZK session expired 
> because this causes reinitialize HMaster partially which is error prone and 
> not a clean state to start from. 
>  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9918) MasterAddressTracker & ZKNamespaceManager ZK listeners are missed after master recovery

2013-11-08 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9918:
-

Attachment: hbase-9918.v1.patch

ReAttach to trigger QA run

> MasterAddressTracker & ZKNamespaceManager ZK listeners are missed after 
> master recovery
> ---
>
> Key: HBASE-9918
> URL: https://issues.apache.org/jira/browse/HBASE-9918
> Project: HBase
>  Issue Type: Bug
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: HBase-9918.patch, hbase-9918.v1.patch
>
>
> TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry always 
> failed at the following verification for me in my dev env(you have to run the 
> single test  not the whole TestZooKeeper suite to reproduce)
> {code}
> assertEquals("Number of rows should be equal to number of puts.", 
> numberOfPuts, numberOfRows);
> {code}
> We missed two ZK listeners after master recovery MasterAddressTracker & 
> ZKNamespaceManager. 
> My current patch is to fix the JIRA issue while I'm wondering if we should 
> totally remove the master failover implementation when ZK session expired 
> because this causes reinitialize HMaster partially which is error prone and 
> not a clean state to start from. 
>  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9918) MasterAddressTracker & ZKNamespaceManager ZK listeners are missed after master recovery

2013-11-08 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9918:
-

Attachment: (was: hbase-9918.v2.patch)

> MasterAddressTracker & ZKNamespaceManager ZK listeners are missed after 
> master recovery
> ---
>
> Key: HBASE-9918
> URL: https://issues.apache.org/jira/browse/HBASE-9918
> Project: HBase
>  Issue Type: Bug
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: HBase-9918.patch, hbase-9918.v1.patch
>
>
> TestZooKeeper#testRegionAssignmentAfterMasterRecoveryDueToZKExpiry always 
> failed at the following verification for me in my dev env(you have to run the 
> single test  not the whole TestZooKeeper suite to reproduce)
> {code}
> assertEquals("Number of rows should be equal to number of puts.", 
> numberOfPuts, numberOfRows);
> {code}
> We missed two ZK listeners after master recovery MasterAddressTracker & 
> ZKNamespaceManager. 
> My current patch is to fix the JIRA issue while I'm wondering if we should 
> totally remove the master failover implementation when ZK session expired 
> because this causes reinitialize HMaster partially which is error prone and 
> not a clean state to start from. 
>  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9932) Remove Master Recovery handling when ZK session expired

2013-11-08 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817947#comment-13817947
 ] 

Jeffrey Zhong commented on HBASE-9932:
--

[~saint@gmail.com] How do you think about this? Thanks.

> Remove Master Recovery handling when ZK session expired
> ---
>
> Key: HBASE-9932
> URL: https://issues.apache.org/jira/browse/HBASE-9932
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Jeffrey Zhong
>
> Currently we use HMaster#tryRecoveringExpiredZKSession to allow master 
> recovery from a ZK session expired error. While this triggers to initialize 
> HMaster partially, it is error prone because it's hard to guarantee the half 
> initialized master is in correct state. I found several times already that 
> the registered ZK listeners are different before & after a fail over.
> Since we already have HA support, I'm proposing to remove this part handling. 
> Though we have a configuration setting "fail.fast.expired.active.master" to 
> skip the logic, why not go one stop further to clean the master code. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9932) Remove Master Recovery handling when ZK session expired

2013-11-08 Thread Jeffrey Zhong (JIRA)
Jeffrey Zhong created HBASE-9932:


 Summary: Remove Master Recovery handling when ZK session expired
 Key: HBASE-9932
 URL: https://issues.apache.org/jira/browse/HBASE-9932
 Project: HBase
  Issue Type: Brainstorming
Reporter: Jeffrey Zhong


Currently we use HMaster#tryRecoveringExpiredZKSession to allow master recovery 
from a ZK session expired error. While this triggers to initialize HMaster 
partially, it is error prone because it's hard to guarantee the half 
initialized master is in correct state. I found several times already that the 
registered ZK listeners are different before & after a fail over.

Since we already have HA support, I'm proposing to remove this part handling. 
Though we have a configuration setting "fail.fast.expired.active.master" to 
skip the logic, why not go one stop further to clean the master code. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-08 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9895:
-

Attachment: exportedTableIn94Format

> 0.96 Import utility can't import an exported file from 0.94
> ---
>
> Key: HBASE-9895
> URL: https://issues.apache.org/jira/browse/HBASE-9895
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: exportedTableIn94Format, hbase-9895.patch
>
>
> Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
> cannot import 0.94 exported files. This issue is annoying because a user 
> can't import his old archive files after upgrade or archives from others who 
> are using 0.94.
> The ideal way is to catch deserialization error and then fall back to 0.94 
> format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-08 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9895:
-

Attachment: (was: hbase-9390-part2-v2.patch)

> 0.96 Import utility can't import an exported file from 0.94
> ---
>
> Key: HBASE-9895
> URL: https://issues.apache.org/jira/browse/HBASE-9895
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: exportedTableIn94Format, hbase-9895.patch
>
>
> Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
> cannot import 0.94 exported files. This issue is annoying because a user 
> can't import his old archive files after upgrade or archives from others who 
> are using 0.94.
> The ideal way is to catch deserialization error and then fall back to 0.94 
> format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-08 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9895:
-

Attachment: hbase-9390-part2-v2.patch

I added a test case. Since there is no way to put the binary 0.94 exported 
file, the new test case testImport94Table will fail in the QA run. 

The new test case run fine in my local env with the binary file is put 
hbase-server/src/test/resources/org/apache/hadoop/hbase/mapreduce/exportedTableIn94Format.

> 0.96 Import utility can't import an exported file from 0.94
> ---
>
> Key: HBASE-9895
> URL: https://issues.apache.org/jira/browse/HBASE-9895
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: exportedTableIn94Format, hbase-9895.patch
>
>
> Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
> cannot import 0.94 exported files. This issue is annoying because a user 
> can't import his old archive files after upgrade or archives from others who 
> are using 0.94.
> The ideal way is to catch deserialization error and then fall back to 0.94 
> format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-08 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9895:
-

Attachment: hbase-9895-v2.patch

> 0.96 Import utility can't import an exported file from 0.94
> ---
>
> Key: HBASE-9895
> URL: https://issues.apache.org/jira/browse/HBASE-9895
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: exportedTableIn94Format, hbase-9895-v2.patch, 
> hbase-9895.patch
>
>
> Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
> cannot import 0.94 exported files. This issue is annoying because a user 
> can't import his old archive files after upgrade or archives from others who 
> are using 0.94.
> The ideal way is to catch deserialization error and then fall back to 0.94 
> format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-08 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818007#comment-13818007
 ] 

Jeffrey Zhong commented on HBASE-9895:
--

{quote}
The sample file is small. Can we add it as a src/test/resource ?
{quote}
The binary data file is added into 
src/test/resources/org/apache/hadoop/hbase/mapreduce/exportedTableIn94Format. 
As I mentioned above, patch file can't contain binary file so I have to attach 
the binary file separately and QA run will fail. When commit the change, I'll 
commit the binary under 
src/test/resources/org/apache/hadoop/hbase/mapreduce/exportedTableIn94Format 
along with the text patch file. Therefore, there won't be any issue. 

> 0.96 Import utility can't import an exported file from 0.94
> ---
>
> Key: HBASE-9895
> URL: https://issues.apache.org/jira/browse/HBASE-9895
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: exportedTableIn94Format, hbase-9895-v2.patch, 
> hbase-9895.patch
>
>
> Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
> cannot import 0.94 exported files. This issue is annoying because a user 
> can't import his old archive files after upgrade or archives from others who 
> are using 0.94.
> The ideal way is to catch deserialization error and then fall back to 0.94 
> format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-09 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818372#comment-13818372
 ] 

Jeffrey Zhong commented on HBASE-9895:
--

That's a typo and should be INPUT_FORMAT_VER. I'll fix it and add comment when 
I check in the patch.  Thanks for the good catch!

> 0.96 Import utility can't import an exported file from 0.94
> ---
>
> Key: HBASE-9895
> URL: https://issues.apache.org/jira/browse/HBASE-9895
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: exportedTableIn94Format, hbase-9895-v2.patch, 
> hbase-9895.patch
>
>
> Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
> cannot import 0.94 exported files. This issue is annoying because a user 
> can't import his old archive files after upgrade or archives from others who 
> are using 0.94.
> The ideal way is to catch deserialization error and then fall back to 0.94 
> format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-11 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-9895:
-

Attachment: hbase-9895-v3.patch

Thanks [~ndimiduk] for the good catch. The missed test case is added back. The 
temporary file is cleaned after each test.

The v3 patch contains [~te...@apache.org] and [~ndimiduk] feedbacks. Thanks.

> 0.96 Import utility can't import an exported file from 0.94
> ---
>
> Key: HBASE-9895
> URL: https://issues.apache.org/jira/browse/HBASE-9895
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: exportedTableIn94Format, hbase-9895-v2.patch, 
> hbase-9895-v3.patch, hbase-9895.patch
>
>
> Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
> cannot import 0.94 exported files. This issue is annoying because a user 
> can't import his old archive files after upgrade or archives from others who 
> are using 0.94.
> The ideal way is to catch deserialization error and then fall back to 0.94 
> format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9895) 0.96 Import utility can't import an exported file from 0.94

2013-11-11 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819595#comment-13819595
 ] 

Jeffrey Zhong commented on HBASE-9895:
--

Thanks for all the reviews! [~enis] I'll rename config setting name to 
"hbase.import.version" when I commit the change.  
[~saint@gmail.com] Yeah, I'll add a release note on this. Thanks.

> 0.96 Import utility can't import an exported file from 0.94
> ---
>
> Key: HBASE-9895
> URL: https://issues.apache.org/jira/browse/HBASE-9895
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 0.96.0
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: exportedTableIn94Format, hbase-9895-v2.patch, 
> hbase-9895-v3.patch, hbase-9895.patch
>
>
> Basically we PBed org.apache.hadoop.hbase.client.Result so a 0.96 cluster 
> cannot import 0.94 exported files. This issue is annoying because a user 
> can't import his old archive files after upgrade or archives from others who 
> are using 0.94.
> The ideal way is to catch deserialization error and then fall back to 0.94 
> format for importing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


  1   2   3   4   5   6   7   8   9   10   >