[jira] [Commented] (HBASE-21425) 2.1.1 fails to start over 1.x data; namespace not assigned

2018-11-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674308#comment-16674308
 ] 

Hudson commented on HBASE-21425:


Results for branch branch-2.1
[build #576 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/576/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/576//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/576//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/576//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> 2.1.1 fails to start over 1.x data; namespace not assigned
> --
>
> Key: HBASE-21425
> URL: https://issues.apache.org/jira/browse/HBASE-21425
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21425.branch-2.1.001.patch, 
> HBASE-21425.branch-2.1.002.patch
>
>
> I tested hbase-2.1.1 starting up over data written by branch-1.4. It failed 
> because the TableStateManager, as part of its startup, failed its migration 
> of table state from zookeeper to hbase:meta table. This is exception:
> {code}
> 2018-11-01 10:49:33,678 ERROR [master/kalashnikov:16000:becomeActiveMaster] 
> master.TableStateManager: Unable to get table hbase:namespace state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> hbase:namespace
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:215)
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:147)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableEnabled(AssignmentManager.java:327)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.lambda$processOfflineRegions$3(AssignmentManager.java:1236)
>   at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
>   at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.processOfflineRegions(AssignmentManager.java:1237)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.joinCluster(AssignmentManager.java:1218)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1001)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2257)
>   at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> This happens inside in processOfflineRegions so result of above exception is 
> that procedures are not scheduled; i.e. namespace table assign for one is not 
> assigned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21425) 2.1.1 fails to start over 1.x data; namespace not assigned

2018-11-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674297#comment-16674297
 ] 

Hudson commented on HBASE-21425:


Results for branch branch-2.0
[build #1056 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1056/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1056//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1056//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1056//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> 2.1.1 fails to start over 1.x data; namespace not assigned
> --
>
> Key: HBASE-21425
> URL: https://issues.apache.org/jira/browse/HBASE-21425
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21425.branch-2.1.001.patch, 
> HBASE-21425.branch-2.1.002.patch
>
>
> I tested hbase-2.1.1 starting up over data written by branch-1.4. It failed 
> because the TableStateManager, as part of its startup, failed its migration 
> of table state from zookeeper to hbase:meta table. This is exception:
> {code}
> 2018-11-01 10:49:33,678 ERROR [master/kalashnikov:16000:becomeActiveMaster] 
> master.TableStateManager: Unable to get table hbase:namespace state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> hbase:namespace
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:215)
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:147)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableEnabled(AssignmentManager.java:327)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.lambda$processOfflineRegions$3(AssignmentManager.java:1236)
>   at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
>   at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.processOfflineRegions(AssignmentManager.java:1237)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.joinCluster(AssignmentManager.java:1218)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1001)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2257)
>   at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> This happens inside in processOfflineRegions so result of above exception is 
> that procedures are not scheduled; i.e. namespace table assign for one is not 
> assigned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-19953) Avoid calling post* hook when procedure fails

2018-11-03 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674284#comment-16674284
 ] 

Allan Yang edited comment on HBASE-19953 at 11/4/18 4:11 AM:
-

{quote}
In reality, a synchronous API for DDL operations is super-useful – applications 
can't reasonably proceed to run if an action hasn't completed. So, I'd pose the 
question: how would we know when to say that a DDL operation is "completed 
enough"?
{quote}
For clients >2.0, it is very simple, we only need to check whether the 
procedure is finished. For clients in 1.x, admin.getAlterStauts() is used to 
check the status(for modify table), but since in 2.x, getAlterStauts is 
deprecated , so we need to make 1.x client to wait in sync.
{quote}
I am -1 on just reverting this. 
{quote}
After a careful think, I think we don't need to revert this(I've already 
changed the comment above), we only need to turn ModifyTable to a async 
op(Which is the only sync DDL for 2.x client now). [~elserj] you can see 
HMaster.truncateTable. We also use ProcedurePrepareLatch.createLatch(2, 0) to 
make sure the 2.x client won't sync wait here.
Uploaded a addendum to clarify  my point.




was (Author: allan163):
{quote}
In reality, a synchronous API for DDL operations is super-useful – applications 
can't reasonably proceed to run if an action hasn't completed. So, I'd pose the 
question: how would we know when to say that a DDL operation is "completed 
enough"?
{quote}
For clients >2.0, it is very simple, we only need to check whether the 
procedure is finished. For clients in 1.x, admin.getAlterStauts() is used to 
check the status(for modify table), but since in 2.x, getAlterStauts is 
deprecated , so we need to make 1.x client to wait in sync.
{quote}
I am -1 on just reverting this. 
{quote}
After a careful think, I think we don't need to revert this, we only need to 
turn ModifyTable to a async op(Which is the only sync DDL for 2.x client now). 
[~elserj] you can see HMaster.truncateTable. We also use 
ProcedurePrepareLatch.createLatch(2, 0) to make sure the 2.x client won't sync 
wait here.
Uploaded a addendum to clarify  my point.



> Avoid calling post* hook when procedure fails
> -
>
> Key: HBASE-19953
> URL: https://issues.apache.org/jira/browse/HBASE-19953
> Project: HBase
>  Issue Type: Bug
>  Components: master, proc-v2
>Reporter: Ramesh Mani
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 2.0.0-beta-2, 2.0.0
>
> Attachments: HBASE-19952.001.branch-2.patch, 
> HBASE-19953.002.branch-2.patch, HBASE-19953.003.branch-2.patch, 
> HBASE-19953.branch-2.0.addendum.patch
>
>
> Ramesh pointed out a case where I think we're mishandling some post\* 
> MasterObserver hooks. Specifically, I'm looking at the deleteNamespace.
> We synchronously execute the DeleteNamespace procedure. When the user 
> provides a namespace that isn't empty, the procedure does a rollback (which 
> is just a no-op), but this doesn't propagate an exception up to the 
> NonceProcedureRunnable in {{HMaster#deleteNamespace}}. It took Ramesh 
> pointing it out a bit better to me that the code executes a bit differently 
> than we actually expect.
> I think we need to double-check our post hooks and make sure we aren't 
> invoking them when the procedure actually failed. cc/ [~Apache9], [~stack].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19953) Avoid calling post* hook when procedure fails

2018-11-03 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-19953:
---
Attachment: HBASE-19953.branch-2.0.addendum.patch

> Avoid calling post* hook when procedure fails
> -
>
> Key: HBASE-19953
> URL: https://issues.apache.org/jira/browse/HBASE-19953
> Project: HBase
>  Issue Type: Bug
>  Components: master, proc-v2
>Reporter: Ramesh Mani
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 2.0.0-beta-2, 2.0.0
>
> Attachments: HBASE-19952.001.branch-2.patch, 
> HBASE-19953.002.branch-2.patch, HBASE-19953.003.branch-2.patch, 
> HBASE-19953.branch-2.0.addendum.patch
>
>
> Ramesh pointed out a case where I think we're mishandling some post\* 
> MasterObserver hooks. Specifically, I'm looking at the deleteNamespace.
> We synchronously execute the DeleteNamespace procedure. When the user 
> provides a namespace that isn't empty, the procedure does a rollback (which 
> is just a no-op), but this doesn't propagate an exception up to the 
> NonceProcedureRunnable in {{HMaster#deleteNamespace}}. It took Ramesh 
> pointing it out a bit better to me that the code executes a bit differently 
> than we actually expect.
> I think we need to double-check our post hooks and make sure we aren't 
> invoking them when the procedure actually failed. cc/ [~Apache9], [~stack].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19953) Avoid calling post* hook when procedure fails

2018-11-03 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674284#comment-16674284
 ] 

Allan Yang commented on HBASE-19953:


{quote}
In reality, a synchronous API for DDL operations is super-useful – applications 
can't reasonably proceed to run if an action hasn't completed. So, I'd pose the 
question: how would we know when to say that a DDL operation is "completed 
enough"?
{quote}
For clients >2.0, it is very simple, we only need to check whether the 
procedure is finished. For clients in 1.x, admin.getAlterStauts() is used to 
check the status(for modify table), but since in 2.x, getAlterStauts is 
deprecated , so we need to make 1.x client to wait in sync.
{quote}
I am -1 on just reverting this. 
{quote}
After a careful think, I think we don't need to revert this, we only need to 
turn ModifyTable to a async op(Which is the only sync DDL for 2.x client now). 
[~elserj] you can see HMaster.truncateTable. We also use 
ProcedurePrepareLatch.createLatch(2, 0) to make sure the 2.x client won't sync 
wait here.
Uploaded a addendum to clarify  my point.



> Avoid calling post* hook when procedure fails
> -
>
> Key: HBASE-19953
> URL: https://issues.apache.org/jira/browse/HBASE-19953
> Project: HBase
>  Issue Type: Bug
>  Components: master, proc-v2
>Reporter: Ramesh Mani
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 2.0.0-beta-2, 2.0.0
>
> Attachments: HBASE-19952.001.branch-2.patch, 
> HBASE-19953.002.branch-2.patch, HBASE-19953.003.branch-2.patch
>
>
> Ramesh pointed out a case where I think we're mishandling some post\* 
> MasterObserver hooks. Specifically, I'm looking at the deleteNamespace.
> We synchronously execute the DeleteNamespace procedure. When the user 
> provides a namespace that isn't empty, the procedure does a rollback (which 
> is just a no-op), but this doesn't propagate an exception up to the 
> NonceProcedureRunnable in {{HMaster#deleteNamespace}}. It took Ramesh 
> pointing it out a bit better to me that the code executes a bit differently 
> than we actually expect.
> I think we need to double-check our post hooks and make sure we aren't 
> invoking them when the procedure actually failed. cc/ [~Apache9], [~stack].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21421) Do not kill RS if reportOnlineRegions fails

2018-11-03 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21421:
---
Attachment: HBASE-21421.branch-2.0.002.patch

> Do not kill RS if reportOnlineRegions fails
> ---
>
> Key: HBASE-21421
> URL: https://issues.apache.org/jira/browse/HBASE-21421
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21421.branch-2.0.001.patch, 
> HBASE-21421.branch-2.0.002.patch
>
>
> In the periodic regionServerReport from RS to master, we will call 
> master.getAssignmentManager().reportOnlineRegions() to make sure the RS has a 
> same state with Master. If RS holds a region which master think should be on 
> another RS, the Master will kill the RS.
> But, the regionServerReport could be lagging(due to network or something), 
> which can't represent the current state of RegionServer. Besides, we will 
> call reportRegionStateTransition and try forever until it successfully 
> reported to master  when online a region. We can count on 
> reportRegionStateTransition calls.
> I have encountered cases that the regions are closed on the RS and  
> reportRegionStateTransition to master successfully. But later, a lagging 
> regionServerReport tells the master the region is online on the RS(Which is 
> not at the moment, this call may generated some time ago and delayed by 
> network somehow), the the master think the region should be on another RS, 
> and kill the RS, which should not be.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21423) Procedures for meta table/region should be able to execute in separate workers

2018-11-03 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674273#comment-16674273
 ] 

Allan Yang commented on HBASE-21423:


Ping [~stack]

> Procedures for meta table/region should be able to execute in separate 
> workers 
> ---
>
> Key: HBASE-21423
> URL: https://issues.apache.org/jira/browse/HBASE-21423
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21423.branch-2.0.001.patch, 
> HBASE-21423.branch-2.0.002.patch
>
>
> We have higher priority for meta table procedures, but only in queue level. 
> There is a case that the meta table is closed and a AssignProcedure(or RTSP 
> in branch-2+) is waiting there to be executed, but at the same time, all the 
> Work threads are executing procedures need to write to meta table, then all 
> the worker will be stuck and retry for writing meta, no worker will take the 
> AP for meta.
> Though we have a mechanism that will detect stuck and adding more 
> ''KeepAlive'' workers to the pool to resolve the stuck. It is already stuck a 
> long time.
> This is a real case I encountered in ITBLL.
> So, I add one 'Urgent work' to the ProceudureExecutor, which only take meta 
> procedures(other workers can take meta procedures too), which can resolve 
> this kind of stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21423) Procedures for meta table/region should be able to execute in separate workers

2018-11-03 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21423:
---
Attachment: HBASE-21423.branch-2.0.002.patch

> Procedures for meta table/region should be able to execute in separate 
> workers 
> ---
>
> Key: HBASE-21423
> URL: https://issues.apache.org/jira/browse/HBASE-21423
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 2.0.2
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21423.branch-2.0.001.patch, 
> HBASE-21423.branch-2.0.002.patch
>
>
> We have higher priority for meta table procedures, but only in queue level. 
> There is a case that the meta table is closed and a AssignProcedure(or RTSP 
> in branch-2+) is waiting there to be executed, but at the same time, all the 
> Work threads are executing procedures need to write to meta table, then all 
> the worker will be stuck and retry for writing meta, no worker will take the 
> AP for meta.
> Though we have a mechanism that will detect stuck and adding more 
> ''KeepAlive'' workers to the pool to resolve the stuck. It is already stuck a 
> long time.
> This is a real case I encountered in ITBLL.
> So, I add one 'Urgent work' to the ProceudureExecutor, which only take meta 
> procedures(other workers can take meta procedures too), which can resolve 
> this kind of stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21406) "status 'replication'" should not show SINK if the cluster does not act as sink

2018-11-03 Thread Wellington Chevreuil (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674255#comment-16674255
 ] 

Wellington Chevreuil commented on HBASE-21406:
--

Added initial patch proposal for *branch-1.* Idea here is to not show stats for 
SINK, until it has not received any edits. Added additional metrics showing the 
sink startup time, something as below:
{noformat}
SINK  : TimeStampStarted=1541292912227, Waiting for OPs...{noformat}
 
BTW, while testing, noticed additional issues with metrics for source on 
current branch-1 version:
1) Once started and while no OP eligible for replication occurs, 
TimeStampsOfLastShippedOp shows "Thu Jan 01 01:00:00 GMT 1970", and huge 
Replication Lag is accounted. This seems to be due HBASE-15995, which removed 
code on ReplicationSource class that initializes AgeOfLastShippedOp to the 
startup time:

{noformat}
-  // Reset the sleep multiplier if nothing has actually gone wrong
-  if (!gotIOE) {
-sleepMultiplier = 1;
-// if there was nothing to ship and it's not an error
-// set "ageOfLastShippedOp" to  to indicate that we're current
-
metrics.setAgeOfLastShippedOp(EnvironmentEdgeManager.currentTime(), walGroupId);
+  WALEntryBatch entryBatch = entryReader.take();
+  for (Map.Entry entry : 
entryBatch.getLastSeqIds().entrySet()) {
+waitingUntilCanPush(entry);
{noformat}

2) After source gets OPs to replicate and successfully ships it to target, 
source metrics then keep showing lags, even if there was no new edits to 
replicate. This is also wrong, and was apparently introduced by changes from 
HBASE-15093, which has modified the way log que size is accounted, and 
replication lag calculation logic seems to rely on the log queue size in 
ReplicationLoad:
{noformat}
  long ageOfLastShippedOp = sm.getAgeOfLastShippedOp();
  int sizeOfLogQueue = sm.getSizeOfLogQueue();
  long timeStampOfLastShippedOp = sm.getTimeStampOfLastShippedOp();
  long replicationLag;
  long timePassedAfterLastShippedOp =
  EnvironmentEdgeManager.currentTime() - timeStampOfLastShippedOp;
  if (sizeOfLogQueue != 0) {
// err on the large side
replicationLag = Math.max(ageOfLastShippedOp, 
timePassedAfterLastShippedOp);
  } else if (timePassedAfterLastShippedOp < 2 * ageOfLastShippedOp) {
replicationLag = ageOfLastShippedOp; // last shipped happen recently
  } else {
// last shipped may happen last night,
// so NO real lag although ageOfLastShippedOp is non-zero
replicationLag = 0;
  }
{noformat}

I'll be opening another jira to fix the source metrics issues mentioned above.

> "status 'replication'" should not show SINK if the cluster does not act as 
> sink
> ---
>
> Key: HBASE-21406
> URL: https://issues.apache.org/jira/browse/HBASE-21406
> Project: HBase
>  Issue Type: Improvement
>Reporter: Daisuke Kobayashi
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HBASE-21406-branch-1.001.patch, Screen Shot 2018-10-31 
> at 18.12.54.png
>
>
> When replicating in 1 way, from source to target, {{status 'replication'}} on 
> source always dumps SINK with meaningless metrics. It only makes sense when 
> running the command on target cluster.
> {{status 'replication'}} on source, for example. {{AgeOfLastAppliedOp}} is 
> always zero and {{TimeStampsOfLastAppliedOp}} does not get updated from the 
> time the RS started since it's not acting as sink.
> {noformat}
> source-1.com
>SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=0, 
> TimeStampsOfLastShippedOp=Mon Oct 29 23:44:14 PDT 2018, Replication Lag=0
>SINK  : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Thu Oct 25 
> 23:56:53 PDT 2018
> {noformat}
> {{status 'replication'}} on target works as expected. SOURCE is empty as it's 
> not acting as source:
> {noformat}
> target-1.com
>SOURCE:
>SINK  : AgeOfLastAppliedOp=70, TimeStampsOfLastAppliedOp=Mon Oct 29 
> 23:44:08 PDT 2018
> {noformat}
> This is because {{getReplicationLoadSink}}, called in {{admin.rb}}, always 
> returns a value (not null).
> 1.X
> https://github.com/apache/hbase/blob/rel/1.4.0/hbase-client/src/main/java/org/apache/hadoop/hbase/ServerLoad.java#L194-L204
> 2.X
> https://github.com/apache/hbase/blob/rel/2.0.0/hbase-client/src/main/java/org/apache/hadoop/hbase/ServerLoad.java#L392-L399



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-11-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674252#comment-16674252
 ] 

Hudson commented on HBASE-20952:


Results for branch HBASE-20952
[build #38 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/38/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/38//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/38//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/38//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21406) "status 'replication'" should not show SINK if the cluster does not act as sink

2018-11-03 Thread Wellington Chevreuil (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HBASE-21406:
-
Attachment: HBASE-21406-branch-1.001.patch

> "status 'replication'" should not show SINK if the cluster does not act as 
> sink
> ---
>
> Key: HBASE-21406
> URL: https://issues.apache.org/jira/browse/HBASE-21406
> Project: HBase
>  Issue Type: Improvement
>Reporter: Daisuke Kobayashi
>Assignee: Wellington Chevreuil
>Priority: Minor
> Attachments: HBASE-21406-branch-1.001.patch, Screen Shot 2018-10-31 
> at 18.12.54.png
>
>
> When replicating in 1 way, from source to target, {{status 'replication'}} on 
> source always dumps SINK with meaningless metrics. It only makes sense when 
> running the command on target cluster.
> {{status 'replication'}} on source, for example. {{AgeOfLastAppliedOp}} is 
> always zero and {{TimeStampsOfLastAppliedOp}} does not get updated from the 
> time the RS started since it's not acting as sink.
> {noformat}
> source-1.com
>SOURCE: PeerID=1, AgeOfLastShippedOp=0, SizeOfLogQueue=0, 
> TimeStampsOfLastShippedOp=Mon Oct 29 23:44:14 PDT 2018, Replication Lag=0
>SINK  : AgeOfLastAppliedOp=0, TimeStampsOfLastAppliedOp=Thu Oct 25 
> 23:56:53 PDT 2018
> {noformat}
> {{status 'replication'}} on target works as expected. SOURCE is empty as it's 
> not acting as source:
> {noformat}
> target-1.com
>SOURCE:
>SINK  : AgeOfLastAppliedOp=70, TimeStampsOfLastAppliedOp=Mon Oct 29 
> 23:44:08 PDT 2018
> {noformat}
> This is because {{getReplicationLoadSink}}, called in {{admin.rb}}, always 
> returns a value (not null).
> 1.X
> https://github.com/apache/hbase/blob/rel/1.4.0/hbase-client/src/main/java/org/apache/hadoop/hbase/ServerLoad.java#L194-L204
> 2.X
> https://github.com/apache/hbase/blob/rel/2.0.0/hbase-client/src/main/java/org/apache/hadoop/hbase/ServerLoad.java#L392-L399



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21425) 2.1.1 fails to start over 1.x data; namespace not assigned

2018-11-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674240#comment-16674240
 ] 

Hudson commented on HBASE-21425:


Results for branch branch-2
[build #1482 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1482/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1482//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1482//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1482//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> 2.1.1 fails to start over 1.x data; namespace not assigned
> --
>
> Key: HBASE-21425
> URL: https://issues.apache.org/jira/browse/HBASE-21425
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21425.branch-2.1.001.patch, 
> HBASE-21425.branch-2.1.002.patch
>
>
> I tested hbase-2.1.1 starting up over data written by branch-1.4. It failed 
> because the TableStateManager, as part of its startup, failed its migration 
> of table state from zookeeper to hbase:meta table. This is exception:
> {code}
> 2018-11-01 10:49:33,678 ERROR [master/kalashnikov:16000:becomeActiveMaster] 
> master.TableStateManager: Unable to get table hbase:namespace state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> hbase:namespace
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:215)
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:147)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableEnabled(AssignmentManager.java:327)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.lambda$processOfflineRegions$3(AssignmentManager.java:1236)
>   at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
>   at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.processOfflineRegions(AssignmentManager.java:1237)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.joinCluster(AssignmentManager.java:1218)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1001)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2257)
>   at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> This happens inside in processOfflineRegions so result of above exception is 
> that procedures are not scheduled; i.e. namespace table assign for one is not 
> assigned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21425) 2.1.1 fails to start over 1.x data; namespace not assigned

2018-11-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21425:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.2.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to branch-2.0+ Thanks for reviews [~allan163] (and [~Apache9])

> 2.1.1 fails to start over 1.x data; namespace not assigned
> --
>
> Key: HBASE-21425
> URL: https://issues.apache.org/jira/browse/HBASE-21425
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21425.branch-2.1.001.patch, 
> HBASE-21425.branch-2.1.002.patch
>
>
> I tested hbase-2.1.1 starting up over data written by branch-1.4. It failed 
> because the TableStateManager, as part of its startup, failed its migration 
> of table state from zookeeper to hbase:meta table. This is exception:
> {code}
> 2018-11-01 10:49:33,678 ERROR [master/kalashnikov:16000:becomeActiveMaster] 
> master.TableStateManager: Unable to get table hbase:namespace state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> hbase:namespace
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:215)
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:147)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableEnabled(AssignmentManager.java:327)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.lambda$processOfflineRegions$3(AssignmentManager.java:1236)
>   at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
>   at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.processOfflineRegions(AssignmentManager.java:1237)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.joinCluster(AssignmentManager.java:1218)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1001)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2257)
>   at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> This happens inside in processOfflineRegions so result of above exception is 
> that procedures are not scheduled; i.e. namespace table assign for one is not 
> assigned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21351) The force update thread may have race with PE worker when the procedure is rolling back

2018-11-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674032#comment-16674032
 ] 

Hudson commented on HBASE-21351:


Results for branch branch-2.0
[build #1054 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1054/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1054//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1054//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1054//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> The force update thread may have race with PE worker when the procedure is 
> rolling back
> ---
>
> Key: HBASE-21351
> URL: https://issues.apache.org/jira/browse/HBASE-21351
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21351-v1.patch, HBASE-21351-v1.patch, 
> HBASE-21351-v2.patch, HBASE-21351.patch
>
>
> We will acquire the procExecutionLock for a procedure when force updating its 
> state to prevent race with PE worker, but this does not work then the 
> procedure is rolling back.
> If a procedure is failed, we will mark the root procedure stack as FAILED, 
> and then start to rollback the whole procedure stack. We will pop every 
> procedure in the stack and try to rollback them. So we may change the state 
> of a procedure without holding its procExecutionLock when rolling back.
> This means we may persist an intermediate state of a procedure and cause 
> corruption when loading procedures. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21422) NPE in TestMergeTableRegionsProcedure.testMergeWithoutPONR

2018-11-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674018#comment-16674018
 ] 

Hudson commented on HBASE-21422:


Results for branch master
[build #583 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/583/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/583//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/583//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/583//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> NPE in TestMergeTableRegionsProcedure.testMergeWithoutPONR
> --
>
> Key: HBASE-21422
> URL: https://issues.apache.org/jira/browse/HBASE-21422
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2, test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21422-v1.patch, HBASE-21422-v1.patch, 
> HBASE-21422.patch
>
>
> {noformat}
> 2018-10-31 16:22:01,302 ERROR [Time-limited test] 
> assignment.TestMergeTableRegionsProcedure(305): error!
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.getStateId(MergeTableRegionsProcedure.java:386)
>   at 
> org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.getStateId(MergeTableRegionsProcedure.java:84)
>   at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.getCurrentStateId(StateMachineProcedure.java:276)
>   at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureTestingUtility.testRecoveryAndDoubleExecution(MasterProcedureTestingUtility.java:414)
>   at 
> org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeWithoutPONR(TestMergeTableRegionsProcedure.java:296)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21351) The force update thread may have race with PE worker when the procedure is rolling back

2018-11-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674020#comment-16674020
 ] 

Hudson commented on HBASE-21351:


Results for branch master
[build #583 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/583/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/583//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/583//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/583//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> The force update thread may have race with PE worker when the procedure is 
> rolling back
> ---
>
> Key: HBASE-21351
> URL: https://issues.apache.org/jira/browse/HBASE-21351
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21351-v1.patch, HBASE-21351-v1.patch, 
> HBASE-21351-v2.patch, HBASE-21351.patch
>
>
> We will acquire the procExecutionLock for a procedure when force updating its 
> state to prevent race with PE worker, but this does not work then the 
> procedure is rolling back.
> If a procedure is failed, we will mark the root procedure stack as FAILED, 
> and then start to rollback the whole procedure stack. We will pop every 
> procedure in the stack and try to rollback them. So we may change the state 
> of a procedure without holding its procExecutionLock when rolling back.
> This means we may persist an intermediate state of a procedure and cause 
> corruption when loading procedures. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21407) Resolve NPE in backup Master UI

2018-11-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674019#comment-16674019
 ] 

Hudson commented on HBASE-21407:


Results for branch master
[build #583 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/583/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/583//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/583//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/583//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Resolve NPE in backup Master UI 
> 
>
> Key: HBASE-21407
> URL: https://issues.apache.org/jira/browse/HBASE-21407
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: hbase-21407.master.001.patch, 
> hbase-21407.master.001.patch, hbase-21407.master.001.patch
>
>
> Since some pages of our UI are using jsp instead of jamon, the fix of 
> HBASE-18263 is not enough. Added the fix of HBASE-18263 to the header.jsp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21351) The force update thread may have race with PE worker when the procedure is rolling back

2018-11-03 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674015#comment-16674015
 ] 

Hudson commented on HBASE-21351:


Results for branch branch-2
[build #1481 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1481/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1481//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1481//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1481//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> The force update thread may have race with PE worker when the procedure is 
> rolling back
> ---
>
> Key: HBASE-21351
> URL: https://issues.apache.org/jira/browse/HBASE-21351
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
> Attachments: HBASE-21351-v1.patch, HBASE-21351-v1.patch, 
> HBASE-21351-v2.patch, HBASE-21351.patch
>
>
> We will acquire the procExecutionLock for a procedure when force updating its 
> state to prevent race with PE worker, but this does not work then the 
> procedure is rolling back.
> If a procedure is failed, we will mark the root procedure stack as FAILED, 
> and then start to rollback the whole procedure stack. We will pop every 
> procedure in the stack and try to rollback them. So we may change the state 
> of a procedure without holding its procExecutionLock when rolling back.
> This means we may persist an intermediate state of a procedure and cause 
> corruption when loading procedures. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21035) Meta Table should be able to online even if all procedures are lost

2018-11-03 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673987#comment-16673987
 ] 

Allan Yang commented on HBASE-21035:


[~stack], no, it is not. The reason is that I stopped RS first, leaving the 
splitting WAL dir there, andnow master 'thinks' there always a SCP for the 
splitting WAL, so no one will process these down RS dirs, and no one will bring 
any regions online.

> Meta Table should be able to online even if all procedures are lost
> ---
>
> Key: HBASE-21035
> URL: https://issues.apache.org/jira/browse/HBASE-21035
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21035.branch-2.0.001.patch, 
> HBASE-21035.branch-2.1.001.patch
>
>
> After HBASE-20708, we changed the way we init after master starts. It will 
> only check WAL dirs and compare to Zookeeper RS nodes to decide which server 
> need to expire. For servers which's dir is ending with 'SPLITTING', we assure 
> that there will be a SCP for it.
> But, if the server with the meta region crashed before master restarts, and 
> if all the procedure wals are lost (due to bug, or deleted manually, 
> whatever), the new restarted master will be stuck when initing. Since no one 
> will bring meta region online.
> Although it is an anomaly case, but I think no matter what happens, we need 
> to online meta region. Otherwise, we are sitting ducks, noting can be done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21425) 2.1.1 fails to start over 1.x data; namespace not assigned

2018-11-03 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673986#comment-16673986
 ] 

Allan Yang commented on HBASE-21425:


V2 looks good, +1 for it.

> 2.1.1 fails to start over 1.x data; namespace not assigned
> --
>
> Key: HBASE-21425
> URL: https://issues.apache.org/jira/browse/HBASE-21425
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.3, 2.1.2
>
> Attachments: HBASE-21425.branch-2.1.001.patch, 
> HBASE-21425.branch-2.1.002.patch
>
>
> I tested hbase-2.1.1 starting up over data written by branch-1.4. It failed 
> because the TableStateManager, as part of its startup, failed its migration 
> of table state from zookeeper to hbase:meta table. This is exception:
> {code}
> 2018-11-01 10:49:33,678 ERROR [master/kalashnikov:16000:becomeActiveMaster] 
> master.TableStateManager: Unable to get table hbase:namespace state
> org.apache.hadoop.hbase.master.TableStateManager$TableStateNotFoundException: 
> hbase:namespace
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:215)
>   at 
> org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:147)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableEnabled(AssignmentManager.java:327)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.lambda$processOfflineRegions$3(AssignmentManager.java:1236)
>   at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
>   at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
>   at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.processOfflineRegions(AssignmentManager.java:1237)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.joinCluster(AssignmentManager.java:1218)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1001)
>   at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2257)
>   at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:583)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> This happens inside in processOfflineRegions so result of above exception is 
> that procedures are not scheduled; i.e. namespace table assign for one is not 
> assigned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)