[jira] [Commented] (HBASE-21727) Simplify documentation around client timeout

2019-02-04 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760515#comment-16760515
 ] 

stack commented on HBASE-21727:
---

What do I think? I think you have a point and should have noticed it on review. 
Want me to put the method back [~psomogyi]?

> Simplify documentation around client timeout
> 
>
> Key: HBASE-21727
> URL: https://issues.apache.org/jira/browse/HBASE-21727
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0, 2.1.3, 2.0.5
>
> Attachments: HBASE-21727.master.001.patch
>
>
> Client rpc timeouts are not easy to understand from the documentation. 
> [~stack] also had an idea to point to doc when exception is thrown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21844) Master could get stuck in initializing state while waiting for meta

2019-02-04 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760514#comment-16760514
 ] 

stack commented on HBASE-21844:
---

On the patch:

On rename of the method, could go either way but the result is a boolean so the 
isRegion... seems more appropriate than wait?... (that it blocks unless error 
is noted in the javadoc). This is a nit.

The added logging is no harm (benefit actually) though this will get spewed a 
bunch ?

1235  LOG.warn("{} state is OPEN, but the server {} is dead. 
Waiting for SCP to recover it.",
1236  ri.getRegionNameAsString(), rs.getServerName());

On this...

1238  LOG.error("{} State is OPEN, but the server {} is not online 
and no SCP is scheduled. Expiring the server.",
1239  ri.getRegionNameAsString(), rs.getServerName());
1240  this.getServerManager().expireServer(rs.getServerName());

... we could be processing the dead server already?  You could check.

Yeah, if no SCP for this server, then the above would help but I'm interested 
in why no SCP scheduled. That seems like the more interesting issue. If we are 
failing to schedule an SCP or dropping one around startup, we should try and 
fix that.

Thank you.

> Master could get stuck in initializing state while waiting for meta
> ---
>
> Key: HBASE-21844
> URL: https://issues.apache.org/jira/browse/HBASE-21844
> Project: HBase
>  Issue Type: Bug
>  Components: master, meta
>Affects Versions: 3.0.0
>Reporter: Bahram Chehrazy
>Assignee: Bahram Chehrazy
>Priority: Major
> Attachments: 
> 0001-HBASE-21844-Handling-incorrect-Meta-state-on-Zookeep.patch
>
>
> If the active master crashes after meta server dies, there is a slight chance 
> of master getting into a state where the ZK says meta is OPEN, but the server 
> is dead and there is no active SCP to recover it (perhaps the SCP has aborted 
> and the procWALs were corrupted). In this case the waitForMetaOnline never 
> returns.
>  
> We've seen this happening a few times when there had been a temporary HDFS 
> outage. Following log lines shows this state.
>  
> 2019-01-17 18:55:48,497 WARN  [master/:16000:becomeActiveMaster] 
> master.HMaster: hbase:meta,,1.1588230740 is NOT online; state=
> {1588230740 *state=*OPEN**, ts=1547780128227, 
> server=*,16020,1547776821322}
> ; *ServerCrashProcedures=false*. Master startup cannot progress, in 
> holding-pattern until region onlined.
>  
> I'm still investigating why and how to prevent getting into this bad state, 
> but nevertheless the master should be able to recover during a restart by 
> initiating a new SCP to fix the meta.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21844) Master could get stuck in initializing state while waiting for meta

2019-02-04 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760508#comment-16760508
 ] 

stack commented on HBASE-21844:
---

bq. But I believe, we could all benefit from a resilient master that knows how 
to correct it's own state.

Agree.

bq. ...in holding-pattern until region onlined

Yeah, but per [~Apache9], the above was added explicitly so could diagnose how 
we arrived at such a state. A few of us chatting about it a while back thought 
it better to have it hold rather than try and progress because on progress we 
could do damage (See HBASE-21035).We added a section to the hbck2 primitive doc 
on how to get past this hump: 
https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2#master-startup-cannot-progress-in-holding-pattern-until-region-onlined
 Hope this helps. We can take a look at logs too if that'd help (can get 
cryptic around startup...).

And yeah, what version. Thanks. 

> Master could get stuck in initializing state while waiting for meta
> ---
>
> Key: HBASE-21844
> URL: https://issues.apache.org/jira/browse/HBASE-21844
> Project: HBase
>  Issue Type: Bug
>  Components: master, meta
>Affects Versions: 3.0.0
>Reporter: Bahram Chehrazy
>Assignee: Bahram Chehrazy
>Priority: Major
> Attachments: 
> 0001-HBASE-21844-Handling-incorrect-Meta-state-on-Zookeep.patch
>
>
> If the active master crashes after meta server dies, there is a slight chance 
> of master getting into a state where the ZK says meta is OPEN, but the server 
> is dead and there is no active SCP to recover it (perhaps the SCP has aborted 
> and the procWALs were corrupted). In this case the waitForMetaOnline never 
> returns.
>  
> We've seen this happening a few times when there had been a temporary HDFS 
> outage. Following log lines shows this state.
>  
> 2019-01-17 18:55:48,497 WARN  [master/:16000:becomeActiveMaster] 
> master.HMaster: hbase:meta,,1.1588230740 is NOT online; state=
> {1588230740 *state=*OPEN**, ts=1547780128227, 
> server=*,16020,1547776821322}
> ; *ServerCrashProcedures=false*. Master startup cannot progress, in 
> holding-pattern until region onlined.
>  
> I'm still investigating why and how to prevent getting into this bad state, 
> but nevertheless the master should be able to recover during a restart by 
> initiating a new SCP to fix the meta.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21727) Simplify documentation around client timeout

2019-02-04 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760503#comment-16760503
 ] 

Duo Zhang commented on HBASE-21727:
---

What do you guys think? [~stack] [~psomogyi], let's add the method back for 
branches other than master? Or just mark this as incompatible change?

> Simplify documentation around client timeout
> 
>
> Key: HBASE-21727
> URL: https://issues.apache.org/jira/browse/HBASE-21727
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0, 2.1.3, 2.0.5
>
> Attachments: HBASE-21727.master.001.patch
>
>
> Client rpc timeouts are not easy to understand from the documentation. 
> [~stack] also had an idea to point to doc when exception is thrown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21727) Simplify documentation around client timeout

2019-02-04 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21727:
--
Hadoop Flags: Incompatible change,Reviewed

> Simplify documentation around client timeout
> 
>
> Key: HBASE-21727
> URL: https://issues.apache.org/jira/browse/HBASE-21727
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0, 2.1.3, 2.0.5
>
> Attachments: HBASE-21727.master.001.patch
>
>
> Client rpc timeouts are not easy to understand from the documentation. 
> [~stack] also had an idea to point to doc when exception is thrown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-21727) Simplify documentation around client timeout

2019-02-04 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang reopened HBASE-21727:
---

HBaseConfiguration is IA.Public and we removed a public method in it, I think 
this can only be done for master? For branch-2.x, we should keep it and marked 
it as deprecated?

> Simplify documentation around client timeout
> 
>
> Key: HBASE-21727
> URL: https://issues.apache.org/jira/browse/HBASE-21727
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0, 2.1.3, 2.0.5
>
> Attachments: HBASE-21727.master.001.patch
>
>
> Client rpc timeouts are not easy to understand from the documentation. 
> [~stack] also had an idea to point to doc when exception is thrown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21727) Simplify documentation around client timeout

2019-02-04 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21727:
--
Hadoop Flags: Reviewed  (was: Incompatible change,Reviewed)

> Simplify documentation around client timeout
> 
>
> Key: HBASE-21727
> URL: https://issues.apache.org/jira/browse/HBASE-21727
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0, 2.1.3, 2.0.5
>
> Attachments: HBASE-21727.master.001.patch
>
>
> Client rpc timeouts are not easy to understand from the documentation. 
> [~stack] also had an idea to point to doc when exception is thrown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19616) Review of LogCleaner Class

2019-02-04 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19616:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

It went into master clean. It didn't go back to branch-2. If you want to put up 
a patch for branch-2, I'll apply it [~belugabehr]. Meantime, resolving. Thanks 
for the nice cleanup sir.

> Review of LogCleaner Class
> --
>
> Key: HBASE-19616
> URL: https://issues.apache.org/jira/browse/HBASE-19616
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HBASE-19616.1.patch, HBASE-19616.2.patch, 
> HBASE-19616.3.patch
>
>
> * Parameterize logging
> * Remove compiler-reported dead code to re-enabling useful logging
> * Use ArrayList instead of LinkedList
> Used a CountDownLatch to replace a bunch of the existing code. It currently 
> loops with a 500ms interval to check if some sort of condition has been met 
> until the amount of time spent looping is greater than some timeout value. 
> Using a CountDownLatch allows one or more threads to wait until a set of 
> operations being performed in other threads completes. It will not blindly 
> sleep between checks and it will return immediately after the condition is 
> met. This removes the HBase configuration that controls the sleep interval.
>  
> I also cleaned up the unit tests a bit and enhanced the logging of this class 
> to ease troubleshooting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19616) Review of LogCleaner Class

2019-02-04 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760494#comment-16760494
 ] 

stack commented on HBASE-19616:
---

[~belugabehr] No harm. Yeah, I was wondering why the check no longer needed. 
You confident it never null?

Let me push this.

> Review of LogCleaner Class
> --
>
> Key: HBASE-19616
> URL: https://issues.apache.org/jira/browse/HBASE-19616
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HBASE-19616.1.patch, HBASE-19616.2.patch, 
> HBASE-19616.3.patch
>
>
> * Parameterize logging
> * Remove compiler-reported dead code to re-enabling useful logging
> * Use ArrayList instead of LinkedList
> Used a CountDownLatch to replace a bunch of the existing code. It currently 
> loops with a 500ms interval to check if some sort of condition has been met 
> until the amount of time spent looping is greater than some timeout value. 
> Using a CountDownLatch allows one or more threads to wait until a set of 
> operations being performed in other threads completes. It will not blindly 
> sleep between checks and it will return immediately after the condition is 
> met. This removes the HBase configuration that controls the sleep interval.
>  
> I also cleaned up the unit tests a bit and enhanced the logging of this class 
> to ease troubleshooting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21844) Master could get stuck in initializing state while waiting for meta

2019-02-04 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760492#comment-16760492
 ] 

Duo Zhang commented on HBASE-21844:
---

BTW, what is the hbase version you running?

> Master could get stuck in initializing state while waiting for meta
> ---
>
> Key: HBASE-21844
> URL: https://issues.apache.org/jira/browse/HBASE-21844
> Project: HBase
>  Issue Type: Bug
>  Components: master, meta
>Affects Versions: 3.0.0
>Reporter: Bahram Chehrazy
>Assignee: Bahram Chehrazy
>Priority: Major
> Attachments: 
> 0001-HBASE-21844-Handling-incorrect-Meta-state-on-Zookeep.patch
>
>
> If the active master crashes after meta server dies, there is a slight chance 
> of master getting into a state where the ZK says meta is OPEN, but the server 
> is dead and there is no active SCP to recover it (perhaps the SCP has aborted 
> and the procWALs were corrupted). In this case the waitForMetaOnline never 
> returns.
>  
> We've seen this happening a few times when there had been a temporary HDFS 
> outage. Following log lines shows this state.
>  
> 2019-01-17 18:55:48,497 WARN  [master/:16000:becomeActiveMaster] 
> master.HMaster: hbase:meta,,1.1588230740 is NOT online; state=
> {1588230740 *state=*OPEN**, ts=1547780128227, 
> server=*,16020,1547776821322}
> ; *ServerCrashProcedures=false*. Master startup cannot progress, in 
> holding-pattern until region onlined.
>  
> I'm still investigating why and how to prevent getting into this bad state, 
> but nevertheless the master should be able to recover during a restart by 
> initiating a new SCP to fix the meta.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21844) Master could get stuck in initializing state while waiting for meta

2019-02-04 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760490#comment-16760490
 ] 

Duo Zhang commented on HBASE-21844:
---

You can not code for unknown problems, unknown bugs and problem can only be 
fixed by external tools, backups, etc. The problem here is that we need to know 
why SCP could finish without scheduling a procedure to bring the meta region 
online.

> Master could get stuck in initializing state while waiting for meta
> ---
>
> Key: HBASE-21844
> URL: https://issues.apache.org/jira/browse/HBASE-21844
> Project: HBase
>  Issue Type: Bug
>  Components: master, meta
>Affects Versions: 3.0.0
>Reporter: Bahram Chehrazy
>Assignee: Bahram Chehrazy
>Priority: Major
> Attachments: 
> 0001-HBASE-21844-Handling-incorrect-Meta-state-on-Zookeep.patch
>
>
> If the active master crashes after meta server dies, there is a slight chance 
> of master getting into a state where the ZK says meta is OPEN, but the server 
> is dead and there is no active SCP to recover it (perhaps the SCP has aborted 
> and the procWALs were corrupted). In this case the waitForMetaOnline never 
> returns.
>  
> We've seen this happening a few times when there had been a temporary HDFS 
> outage. Following log lines shows this state.
>  
> 2019-01-17 18:55:48,497 WARN  [master/:16000:becomeActiveMaster] 
> master.HMaster: hbase:meta,,1.1588230740 is NOT online; state=
> {1588230740 *state=*OPEN**, ts=1547780128227, 
> server=*,16020,1547776821322}
> ; *ServerCrashProcedures=false*. Master startup cannot progress, in 
> holding-pattern until region onlined.
>  
> I'm still investigating why and how to prevent getting into this bad state, 
> but nevertheless the master should be able to recover during a restart by 
> initiating a new SCP to fix the meta.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21845) Make a 2.0.5 release

2019-02-04 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760487#comment-16760487
 ] 

stack commented on HBASE-21845:
---

Nightlies are passing about 50% of the time: 
https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-2.0/

Looking at dashboard, 
https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2.0/lastSuccessfulBuild/artifact/dashboard.html
 ... there are a few uglies that fail frequently. Let me take a look at a 
few

> Make a 2.0.5 release
> 
>
> Key: HBASE-21845
> URL: https://issues.apache.org/jira/browse/HBASE-21845
> Project: HBase
>  Issue Type: Bug
>  Components: release
>Affects Versions: 2.0.5
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.5
>
>
> Make a release of 2.0.5 off branch-2.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21845) Make a 2.0.5 release

2019-02-04 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760488#comment-16760488
 ] 

Duo Zhang commented on HBASE-21845:
---

Dup with HBASE-21802?

> Make a 2.0.5 release
> 
>
> Key: HBASE-21845
> URL: https://issues.apache.org/jira/browse/HBASE-21845
> Project: HBase
>  Issue Type: Bug
>  Components: release
>Affects Versions: 2.0.5
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.0.5
>
>
> Make a release of 2.0.5 off branch-2.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21845) Make a 2.0.5 release

2019-02-04 Thread stack (JIRA)
stack created HBASE-21845:
-

 Summary: Make a 2.0.5 release
 Key: HBASE-21845
 URL: https://issues.apache.org/jira/browse/HBASE-21845
 Project: HBase
  Issue Type: Bug
  Components: release
Affects Versions: 2.0.5
Reporter: stack
Assignee: stack
 Fix For: 2.0.5


Make a release of 2.0.5 off branch-2.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2019-02-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760458#comment-16760458
 ] 

Hudson commented on HBASE-20952:


Results for branch HBASE-20952
[build #68 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/68/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/68//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/68//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/68//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21844) Master could get stuck in initializing state while waiting for meta

2019-02-04 Thread Bahram Chehrazy (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760431#comment-16760431
 ] 

Bahram Chehrazy commented on HBASE-21844:
-

[~Apache9], One possible root cause is that region state for the meta does not 
get updated when the server crashes. I'm working on another patch for that too. 
But I believe, we could all benefit from a resilient master that knows how to 
correct it's own state.

> Master could get stuck in initializing state while waiting for meta
> ---
>
> Key: HBASE-21844
> URL: https://issues.apache.org/jira/browse/HBASE-21844
> Project: HBase
>  Issue Type: Bug
>  Components: master, meta
>Affects Versions: 3.0.0
>Reporter: Bahram Chehrazy
>Assignee: Bahram Chehrazy
>Priority: Major
> Attachments: 
> 0001-HBASE-21844-Handling-incorrect-Meta-state-on-Zookeep.patch
>
>
> If the active master crashes after meta server dies, there is a slight chance 
> of master getting into a state where the ZK says meta is OPEN, but the server 
> is dead and there is no active SCP to recover it (perhaps the SCP has aborted 
> and the procWALs were corrupted). In this case the waitForMetaOnline never 
> returns.
>  
> We've seen this happening a few times when there had been a temporary HDFS 
> outage. Following log lines shows this state.
>  
> 2019-01-17 18:55:48,497 WARN  [master/:16000:becomeActiveMaster] 
> master.HMaster: hbase:meta,,1.1588230740 is NOT online; state=
> {1588230740 *state=*OPEN**, ts=1547780128227, 
> server=*,16020,1547776821322}
> ; *ServerCrashProcedures=false*. Master startup cannot progress, in 
> holding-pattern until region onlined.
>  
> I'm still investigating why and how to prevent getting into this bad state, 
> but nevertheless the master should be able to recover during a restart by 
> initiating a new SCP to fix the meta.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-19616) Review of LogCleaner Class

2019-02-04 Thread BELUGA BEHR (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760401#comment-16760401
 ] 

BELUGA BEHR edited comment on HBASE-19616 at 2/5/19 2:23 AM:
-

[~stack]  Thank you for the review! :)

I'm not sure where that line of code is:

{code}
if (context != null) {
{code}

It used to be in there, but I took it out.

Please consider the latest patch for inclusion into the project.


was (Author: belugabehr):
[~stack]  Thank you for the review! :)

I'm not sure where that line of code is:

{code}
if (context != null) {
{code]

It used to be in there, but I took it out.

Please consider the latest patch for inclusion into the project.

> Review of LogCleaner Class
> --
>
> Key: HBASE-19616
> URL: https://issues.apache.org/jira/browse/HBASE-19616
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HBASE-19616.1.patch, HBASE-19616.2.patch, 
> HBASE-19616.3.patch
>
>
> * Parameterize logging
> * Remove compiler-reported dead code to re-enabling useful logging
> * Use ArrayList instead of LinkedList
> Used a CountDownLatch to replace a bunch of the existing code. It currently 
> loops with a 500ms interval to check if some sort of condition has been met 
> until the amount of time spent looping is greater than some timeout value. 
> Using a CountDownLatch allows one or more threads to wait until a set of 
> operations being performed in other threads completes. It will not blindly 
> sleep between checks and it will return immediately after the condition is 
> met. This removes the HBase configuration that controls the sleep interval.
>  
> I also cleaned up the unit tests a bit and enhanced the logging of this class 
> to ease troubleshooting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21795) Client application may get stuck (time bound) if a table modify op is called immediately after split op

2019-02-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760406#comment-16760406
 ] 

Hudson commented on HBASE-21795:


Results for branch master
[build #772 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/772/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/772//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/772//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/772//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Client application may get stuck (time bound) if a table modify op is called 
> immediately after split op
> ---
>
> Key: HBASE-21795
> URL: https://issues.apache.org/jira/browse/HBASE-21795
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5, 2.3.0
>
> Attachments: HBASE-21795.master.001.patch
>
>
> *Steps:*
>  * Create a table
>  * Split the table
>  * Modify the table immediately after splitting
> *Expected*: 
> The modify table procedure completes and control returns back to client
> *Actual:* 
> The modify table procedure completes and control does not return back to 
> client, until catalog janitor runs and deletes parent or future timeout occurs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21840) TestHRegionWithInMemoryFlush fails with NPE

2019-02-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760405#comment-16760405
 ] 

Hudson commented on HBASE-21840:


Results for branch master
[build #772 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/772/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/772//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/772//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/772//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> TestHRegionWithInMemoryFlush fails with NPE
> ---
>
> Key: HBASE-21840
> URL: https://issues.apache.org/jira/browse/HBASE-21840
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5, 2.3.0
>
> Attachments: HBASE-21840.patch
>
>
> Found this one when testing 2.1.3.
> {noformat}
> Exception in thread "PutThread" java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore.checkActiveSize(CompactingMemStore.java:392)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.internalAdd(AbstractMemStore.java:307)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:132)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:112)
> at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:750)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.applyToMemStore(HRegion.java:4420)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.access$500(HRegion.java:226)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.applyFamilyMapToMemStore(HRegion.java:3479)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lambda$writeMiniBatchOperationsToMemStore$0(HRegion.java:3170)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.visitBatchOperations(HRegion.java:3103)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3162)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3644)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4058)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3991)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3922)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3913)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3927)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4254)
> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3046)
> {noformat}
> And later the test is stuck, since the MVCC can not be advanced any more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21843) AM misses region assignment in catastrophic scenarios where RS assigned to the region in Meta does not have a WAL dir.

2019-02-04 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760402#comment-16760402
 ] 

Duo Zhang commented on HBASE-21843:
---

Anyway, [~stack] I think we have some missing parts in the ITBLL test, for 
example, we do not restart datanodes and namenodes in ITBLL, and for addressing 
this issue, maybe we even need to make the HDFS full...

> AM misses region assignment in catastrophic scenarios where RS assigned to 
> the region in Meta does not have a WAL dir.
> --
>
> Key: HBASE-21843
> URL: https://issues.apache.org/jira/browse/HBASE-21843
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: HBASE-21843.master.001.patch
>
>
> A bit unusual, but managed to face this twice lately on both distributed and 
> local standalone mode, on VMs. Somehow, after some VM pause/resume, got into 
> a situation where regions on meta were assigned to a give RS startcode that 
> had no corresponding WAL dir.
> That caused those regions to never get assigned, because the given RS 
> startcode is not found anywhere by RegionServerTracker/ServerManager, so no 
> SCP is created to this RS startcode, leaving the region "open" on a dead 
> server forever, in META.
> Could get this sorted by adding extra check on loadMeta, checking if the RS 
> assigned to the region in meta is not online and doesn't have a WAL dir, then 
> mark this region as offline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21795) Client application may get stuck (time bound) if a table modify op is called immediately after split op

2019-02-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760396#comment-16760396
 ] 

Hudson commented on HBASE-21795:


Results for branch branch-2.1
[build #833 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/833/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/833//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/833//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/833//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Client application may get stuck (time bound) if a table modify op is called 
> immediately after split op
> ---
>
> Key: HBASE-21795
> URL: https://issues.apache.org/jira/browse/HBASE-21795
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5, 2.3.0
>
> Attachments: HBASE-21795.master.001.patch
>
>
> *Steps:*
>  * Create a table
>  * Split the table
>  * Modify the table immediately after splitting
> *Expected*: 
> The modify table procedure completes and control returns back to client
> *Actual:* 
> The modify table procedure completes and control does not return back to 
> client, until catalog janitor runs and deletes parent or future timeout occurs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19616) Review of LogCleaner Class

2019-02-04 Thread BELUGA BEHR (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760401#comment-16760401
 ] 

BELUGA BEHR commented on HBASE-19616:
-

[~stack]  Thank you for the review! :)

I'm not sure where that line of code is:

{code}
if (context != null) {
{code]

It used to be in there, but I took it out.

Please consider the latest patch for inclusion into the project.

> Review of LogCleaner Class
> --
>
> Key: HBASE-19616
> URL: https://issues.apache.org/jira/browse/HBASE-19616
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HBASE-19616.1.patch, HBASE-19616.2.patch, 
> HBASE-19616.3.patch
>
>
> * Parameterize logging
> * Remove compiler-reported dead code to re-enabling useful logging
> * Use ArrayList instead of LinkedList
> Used a CountDownLatch to replace a bunch of the existing code. It currently 
> loops with a 500ms interval to check if some sort of condition has been met 
> until the amount of time spent looping is greater than some timeout value. 
> Using a CountDownLatch allows one or more threads to wait until a set of 
> operations being performed in other threads completes. It will not blindly 
> sleep between checks and it will return immediately after the condition is 
> met. This removes the HBase configuration that controls the sleep interval.
>  
> I also cleaned up the unit tests a bit and enhanced the logging of this class 
> to ease troubleshooting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21843) AM misses region assignment in catastrophic scenarios where RS assigned to the region in Meta does not have a WAL dir.

2019-02-04 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760400#comment-16760400
 ] 

Duo Zhang commented on HBASE-21843:
---

And which version do you use? On master and branch-2, HBASE-21588 has 
introduced a procedure based wal splitting, this may also effect the zk based 
wal splitting?

> AM misses region assignment in catastrophic scenarios where RS assigned to 
> the region in Meta does not have a WAL dir.
> --
>
> Key: HBASE-21843
> URL: https://issues.apache.org/jira/browse/HBASE-21843
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: HBASE-21843.master.001.patch
>
>
> A bit unusual, but managed to face this twice lately on both distributed and 
> local standalone mode, on VMs. Somehow, after some VM pause/resume, got into 
> a situation where regions on meta were assigned to a give RS startcode that 
> had no corresponding WAL dir.
> That caused those regions to never get assigned, because the given RS 
> startcode is not found anywhere by RegionServerTracker/ServerManager, so no 
> SCP is created to this RS startcode, leaving the region "open" on a dead 
> server forever, in META.
> Could get this sorted by adding extra check on loadMeta, checking if the RS 
> assigned to the region in meta is not online and doesn't have a WAL dir, then 
> mark this region as offline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21819) Generate CHANGES.md and RELEASENOTES.md for 2.1.3

2019-02-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760397#comment-16760397
 ] 

Hudson commented on HBASE-21819:


Results for branch branch-2.1
[build #833 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/833/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/833//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/833//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/833//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Generate CHANGES.md and RELEASENOTES.md for 2.1.3
> -
>
> Key: HBASE-21819
> URL: https://issues.apache.org/jira/browse/HBASE-21819
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, release
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.1.3
>
> Attachments: HBASE-21819-branch-2.1-addendum.patch, 
> HBASE-21819-branch-2.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21795) Client application may get stuck (time bound) if a table modify op is called immediately after split op

2019-02-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760392#comment-16760392
 ] 

Hudson commented on HBASE-21795:


Results for branch branch-2.0
[build #1317 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1317/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1317//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1317//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1317//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Client application may get stuck (time bound) if a table modify op is called 
> immediately after split op
> ---
>
> Key: HBASE-21795
> URL: https://issues.apache.org/jira/browse/HBASE-21795
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5, 2.3.0
>
> Attachments: HBASE-21795.master.001.patch
>
>
> *Steps:*
>  * Create a table
>  * Split the table
>  * Modify the table immediately after splitting
> *Expected*: 
> The modify table procedure completes and control returns back to client
> *Actual:* 
> The modify table procedure completes and control does not return back to 
> client, until catalog janitor runs and deletes parent or future timeout occurs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21844) Master could get stuck in initializing state while waiting for meta

2019-02-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760343#comment-16760343
 ] 

Hadoop QA commented on HBASE-21844:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 9s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m  
8s{color} | {color:red} hbase-server: The patch generated 17 new + 147 
unchanged - 0 fixed = 164 total (was 147) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
11s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 36s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}128m  
4s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}166m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21844 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12957544/0001-HBASE-21844-Handling-incorrect-Meta-state-on-Zookeep.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 7cb779d67829 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 5f8bdd52a1 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15875/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15875/testReport/ |
| Max. process+thread count | 5072 (vs. ulimit of 1) |
| modules | C: hbase-server U: 

[jira] [Commented] (HBASE-21817) handle corrupted cells like other corrupted WAL cases

2019-02-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760344#comment-16760344
 ] 

Hadoop QA commented on HBASE-21817:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
14s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
32s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
15s{color} | {color:red} hbase-server: The patch generated 2 new + 39 unchanged 
- 0 fixed = 41 total (was 39) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
34s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 38s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}126m  3s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}167m 26s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.io.asyncfs.TestSaslFanOutOneBlockAsyncDFSOutput |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21817 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12957539/HBASE-21817.01.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 7dec52487291 4.4.0-131-generic #157~14.04.1-Ubuntu SMP Fri Jul 
13 08:53:17 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 5f8bdd52a1 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 

[jira] [Commented] (HBASE-21837) Potential race condition when WALSplitter writes the split results

2019-02-04 Thread Bahram Chehrazy (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760337#comment-16760337
 ] 

Bahram Chehrazy commented on HBASE-21837:
-

I don't have a callstack directly caused by this, but nevertheless, it would be 
very similar. Whether the corruption existed in the input file or was created 
by this race condition during processing, it will blew up in the writeBuffer in 
a similar callstack.

> Potential race condition when WALSplitter writes the split results
> --
>
> Key: HBASE-21837
> URL: https://issues.apache.org/jira/browse/HBASE-21837
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 3.0.0
>Reporter: Bahram Chehrazy
>Priority: Major
>
> When WALSplitter writes the split buffer, it calls 
> EntryBuffers.getChunkToWrite in WriterThread.doRun. But getChunkToWrite is 
> not thread safe, and could return garbage when called in parallel. Later when 
> it tries to write the chunk using writeBuffer it could throw an exception 
> like this:
>  
> 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] 
> executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY 
> java.lang.RuntimeException: java.lang.NegativeArraySizeException at 
> org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349) at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196) at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178)
>  at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90)
>  at 
> org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
>  at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745) Caused by: 
> java.lang.NegativeArraySizeException at 
> org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113) at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21843) AM misses region assignment in catastrophic scenarios where RS assigned to the region in Meta does not have a WAL dir.

2019-02-04 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760326#comment-16760326
 ] 

Duo Zhang commented on HBASE-21843:
---

Is this the same with HBASE-21844? The HDFS is gone, and when it comes back, we 
have ‘lost’ several procedures.

> AM misses region assignment in catastrophic scenarios where RS assigned to 
> the region in Meta does not have a WAL dir.
> --
>
> Key: HBASE-21843
> URL: https://issues.apache.org/jira/browse/HBASE-21843
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: HBASE-21843.master.001.patch
>
>
> A bit unusual, but managed to face this twice lately on both distributed and 
> local standalone mode, on VMs. Somehow, after some VM pause/resume, got into 
> a situation where regions on meta were assigned to a give RS startcode that 
> had no corresponding WAL dir.
> That caused those regions to never get assigned, because the given RS 
> startcode is not found anywhere by RegionServerTracker/ServerManager, so no 
> SCP is created to this RS startcode, leaving the region "open" on a dead 
> server forever, in META.
> Could get this sorted by adding extra check on loadMeta, checking if the RS 
> assigned to the region in meta is not online and doesn't have a WAL dir, then 
> mark this region as offline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21844) Master could get stuck in initializing state while waiting for meta

2019-02-04 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760323#comment-16760323
 ] 

Duo Zhang commented on HBASE-21844:
---

In general we need to find out why this could happen and fix the root cause. I 
do not like to add check everywhere as it will make the code confusing. And for 
fixing the cluster, you can use HBCK to schedule a SCP.

> Master could get stuck in initializing state while waiting for meta
> ---
>
> Key: HBASE-21844
> URL: https://issues.apache.org/jira/browse/HBASE-21844
> Project: HBase
>  Issue Type: Bug
>  Components: master, meta
>Affects Versions: 3.0.0
>Reporter: Bahram Chehrazy
>Assignee: Bahram Chehrazy
>Priority: Major
> Attachments: 
> 0001-HBASE-21844-Handling-incorrect-Meta-state-on-Zookeep.patch
>
>
> If the active master crashes after meta server dies, there is a slight chance 
> of master getting into a state where the ZK says meta is OPEN, but the server 
> is dead and there is no active SCP to recover it (perhaps the SCP has aborted 
> and the procWALs were corrupted). In this case the waitForMetaOnline never 
> returns.
>  
> We've seen this happening a few times when there had been a temporary HDFS 
> outage. Following log lines shows this state.
>  
> 2019-01-17 18:55:48,497 WARN  [master/:16000:becomeActiveMaster] 
> master.HMaster: hbase:meta,,1.1588230740 is NOT online; state=
> {1588230740 *state=*OPEN**, ts=1547780128227, 
> server=*,16020,1547776821322}
> ; *ServerCrashProcedures=false*. Master startup cannot progress, in 
> holding-pattern until region onlined.
>  
> I'm still investigating why and how to prevent getting into this bad state, 
> but nevertheless the master should be able to recover during a restart by 
> initiating a new SCP to fix the meta.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21840) TestHRegionWithInMemoryFlush fails with NPE

2019-02-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760316#comment-16760316
 ] 

Hudson commented on HBASE-21840:


Results for branch branch-2.2
[build #17 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/17/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/17//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/17//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/17//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> TestHRegionWithInMemoryFlush fails with NPE
> ---
>
> Key: HBASE-21840
> URL: https://issues.apache.org/jira/browse/HBASE-21840
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5, 2.3.0
>
> Attachments: HBASE-21840.patch
>
>
> Found this one when testing 2.1.3.
> {noformat}
> Exception in thread "PutThread" java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore.checkActiveSize(CompactingMemStore.java:392)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.internalAdd(AbstractMemStore.java:307)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:132)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:112)
> at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:750)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.applyToMemStore(HRegion.java:4420)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.access$500(HRegion.java:226)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.applyFamilyMapToMemStore(HRegion.java:3479)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lambda$writeMiniBatchOperationsToMemStore$0(HRegion.java:3170)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.visitBatchOperations(HRegion.java:3103)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3162)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3644)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4058)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3991)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3922)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3913)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3927)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4254)
> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3046)
> {noformat}
> And later the test is stuck, since the MVCC can not be advanced any more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21795) Client application may get stuck (time bound) if a table modify op is called immediately after split op

2019-02-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760317#comment-16760317
 ] 

Hudson commented on HBASE-21795:


Results for branch branch-2.2
[build #17 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/17/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/17//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/17//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/17//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Client application may get stuck (time bound) if a table modify op is called 
> immediately after split op
> ---
>
> Key: HBASE-21795
> URL: https://issues.apache.org/jira/browse/HBASE-21795
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5, 2.3.0
>
> Attachments: HBASE-21795.master.001.patch
>
>
> *Steps:*
>  * Create a table
>  * Split the table
>  * Modify the table immediately after splitting
> *Expected*: 
> The modify table procedure completes and control returns back to client
> *Actual:* 
> The modify table procedure completes and control does not return back to 
> client, until catalog janitor runs and deletes parent or future timeout occurs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21795) Client application may get stuck (time bound) if a table modify op is called immediately after split op

2019-02-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760312#comment-16760312
 ] 

Hudson commented on HBASE-21795:


Results for branch branch-2
[build #1660 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1660/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1660//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1660//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1660//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Client application may get stuck (time bound) if a table modify op is called 
> immediately after split op
> ---
>
> Key: HBASE-21795
> URL: https://issues.apache.org/jira/browse/HBASE-21795
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5, 2.3.0
>
> Attachments: HBASE-21795.master.001.patch
>
>
> *Steps:*
>  * Create a table
>  * Split the table
>  * Modify the table immediately after splitting
> *Expected*: 
> The modify table procedure completes and control returns back to client
> *Actual:* 
> The modify table procedure completes and control does not return back to 
> client, until catalog janitor runs and deletes parent or future timeout occurs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21840) TestHRegionWithInMemoryFlush fails with NPE

2019-02-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760311#comment-16760311
 ] 

Hudson commented on HBASE-21840:


Results for branch branch-2
[build #1660 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1660/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1660//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1660//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1660//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> TestHRegionWithInMemoryFlush fails with NPE
> ---
>
> Key: HBASE-21840
> URL: https://issues.apache.org/jira/browse/HBASE-21840
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5, 2.3.0
>
> Attachments: HBASE-21840.patch
>
>
> Found this one when testing 2.1.3.
> {noformat}
> Exception in thread "PutThread" java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore.checkActiveSize(CompactingMemStore.java:392)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.internalAdd(AbstractMemStore.java:307)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:132)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:112)
> at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:750)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.applyToMemStore(HRegion.java:4420)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.access$500(HRegion.java:226)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.applyFamilyMapToMemStore(HRegion.java:3479)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lambda$writeMiniBatchOperationsToMemStore$0(HRegion.java:3170)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.visitBatchOperations(HRegion.java:3103)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3162)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3644)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4058)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3991)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3922)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3913)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3927)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4254)
> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3046)
> {noformat}
> And later the test is stuck, since the MVCC can not be advanced any more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21811) region can be opened on two servers due to race condition with procedures and server reports

2019-02-04 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760309#comment-16760309
 ] 

Duo Zhang commented on HBASE-21811:
---

We have checked the start code, but only before executing any procedures, 
please see the code carefully.

> region can be opened on two servers due to race condition with procedures and 
> server reports
> 
>
> Key: HBASE-21811
> URL: https://issues.apache.org/jira/browse/HBASE-21811
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 3.0.0
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.3.0
>
> Attachments: HBASE-21811-UT.patch, HBASE-21811-v1.patch, 
> HBASE-21811-v2.patch, HBASE-21811.patch
>
>
> Looks like the region server responses are being processed incorrectly in 
> places allowing te region to be opened on two servers.
> * The region server report handling in procedures should check which server 
> is reporting.
> * Also although I didn't check (and it isn't implicated in this bug), RS must 
> check in OPEN that it's actually the correct RS master sent open to (w.r.t. 
> start timestamp)
> This was previosuly "mitigated" by master killing the RS with incorrect 
> reports, but due to race conditions with reports and assignment the kill was 
> replaced with a warning, so now this condition persists.
> Regardless, the kill approach is not a good fix because there's still a 
> window when a region can be opened on two servers.
> A region is being opened by server_48c. The server dies, and we process the 
> retry correctly (retry=3 because 2 previous similar open failures were 
> processed correctly).
> We start opening it on server_1aa now.
> {noformat}
> 2019-01-28 18:12:09,862 INFO  [KeepAlivePEWorker-104] 
> assignment.RegionStateStore: pid=4915 updating hbase:meta 
> row=8be2a423b16471b9417f0f7de04281c6, regionState=ABNORMALLY_CLOSED
> 2019-01-28 18:12:09,862 INFO  [KeepAlivePEWorker-104] 
> procedure.ServerCrashProcedure: pid=11944, 
> state=RUNNABLE:SERVER_CRASH_ASSIGN, hasLock=true; ServerCrashProcedure 
> server=server_48c,17020,1548726406632, splitWal=true, meta=false found RIT 
> pid=4915, ppid=7, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, 
> hasLock=true; TransitRegionStateProcedure table=table, 
> region=8be2a423b16471b9417f0f7de04281c6, ASSIGN; rit=OPENING, 
> location=server_48c,17020,1548726406632, table=table, 
> region=8be2a423b16471b9417f0f7de04281c6
> 2019-01-28 18:12:10,778 INFO  [KeepAlivePEWorker-80] 
> assignment.TransitRegionStateProcedure: Retry=3 of max=2147483647; pid=4915, 
> ppid=7, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; 
> TransitRegionStateProcedure table=table, 
> region=8be2a423b16471b9417f0f7de04281c6, ASSIGN; rit=ABNORMALLY_CLOSED, 
> location=null
> ...
> 2019-01-28 18:12:10,902 INFO  [KeepAlivePEWorker-80] 
> assignment.TransitRegionStateProcedure: Starting pid=4915, ppid=7, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true; 
> TransitRegionStateProcedure table=table, 
> region=8be2a423b16471b9417f0f7de04281c6, ASSIGN; rit=ABNORMALLY_CLOSED, 
> location=null; forceNewPlan=true, retain=false
> 2019-01-28 18:12:11,114 INFO  [PEWorker-7] assignment.RegionStateStore: 
> pid=4915 updating hbase:meta row=8be2a423b16471b9417f0f7de04281c6, 
> regionState=OPENING, regionLocation=server_1aa,17020,1548727658713
> {noformat}
> However, we get the remote procedure failure from 48c after we've already 
> started that.
> It actually tried to open on the restarted RS, which makes me wonder if this 
> is safe also w.r.t. other races - what if RS already initialized and didn't 
> error out?
> Need to check if we verify the start code expected by master on RS when 
> opening.
> {noformat}
> 2019-01-28 18:12:12,179 WARN  [RSProcedureDispatcher-pool4-t362] 
> assignment.RegionRemoteProcedureBase: The remote operation pid=11050, 
> ppid=4915, state=SUCCESS, hasLock=false; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region 
> {ENCODED => 8be2a423b16471b9417f0f7de04281c6 ... to server 
> server_48c,17020,1548726406632 failed
> org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: 
> org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server 
> server_48c,17020,1548727752747 is not running yet
> 2019-01-28 18:12:12,179 WARN  [RSProcedureDispatcher-pool4-t362] 
> procedure.RSProcedureDispatcher: server server_48c,17020,1548726406632 is not 
> up for a while; try a new one
> {noformat}
> Without any other reason (at least logged), the RIT immediately retries again 
> and chooses a new candidate. It then retries again and goes to the new 48c, 
> but that's unrelated.
> 

[jira] [Commented] (HBASE-20053) Remove .cmake file extension from .gitignore

2019-02-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760307#comment-16760307
 ] 

Hudson commented on HBASE-20053:


Results for branch HBASE-20053
[build #1 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20053/1/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20053/1//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20053/1//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20053/1//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Remove .cmake file extension from .gitignore
> 
>
> Key: HBASE-20053
> URL: https://issues.apache.org/jira/browse/HBASE-20053
> Project: HBase
>  Issue Type: Sub-task
>  Components: build, community
>Affects Versions: HBASE-14850
>Reporter: Ted Yu
>Assignee: Norbert Kalmar
>Priority: Minor
>  Labels: build
> Fix For: HBASE-14850
>
> Attachments: HBASE-20053-HBASE-14850.v001.patch
>
>
> There are .cmake files under hbase-native-client/cmake/ which are under 
> source control.
> The .cmake extension should be taken out of hbase-native-client/.gitignore



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21843) AM misses region assignment in catastrophic scenarios where RS assigned to the region in Meta does not have a WAL dir.

2019-02-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760279#comment-16760279
 ] 

Hadoop QA commented on HBASE-21843:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
56s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
5s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
13s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
35s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
33s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 55s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}148m 41s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}191m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21843 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12957517/HBASE-21843.master.001.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 2fb024a0c73d 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 
31 10:55:11 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 5f8bdd52a1 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15874/artifact/patchprocess/whitespace-eol.txt
 |
| unit | 

[jira] [Assigned] (HBASE-21844) Master could get stuck in initializing state while waiting for meta

2019-02-04 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HBASE-21844:


Assignee: Bahram Chehrazy

> Master could get stuck in initializing state while waiting for meta
> ---
>
> Key: HBASE-21844
> URL: https://issues.apache.org/jira/browse/HBASE-21844
> Project: HBase
>  Issue Type: Bug
>  Components: master, meta
>Affects Versions: 3.0.0
>Reporter: Bahram Chehrazy
>Assignee: Bahram Chehrazy
>Priority: Major
> Attachments: 
> 0001-HBASE-21844-Handling-incorrect-Meta-state-on-Zookeep.patch
>
>
> If the active master crashes after meta server dies, there is a slight chance 
> of master getting into a state where the ZK says meta is OPEN, but the server 
> is dead and there is no active SCP to recover it (perhaps the SCP has aborted 
> and the procWALs were corrupted). In this case the waitForMetaOnline never 
> returns.
>  
> We've seen this happening a few times when there had been a temporary HDFS 
> outage. Following log lines shows this state.
>  
> 2019-01-17 18:55:48,497 WARN  [master/:16000:becomeActiveMaster] 
> master.HMaster: hbase:meta,,1.1588230740 is NOT online; state=
> {1588230740 *state=*OPEN**, ts=1547780128227, 
> server=*,16020,1547776821322}
> ; *ServerCrashProcedures=false*. Master startup cannot progress, in 
> holding-pattern until region onlined.
>  
> I'm still investigating why and how to prevent getting into this bad state, 
> but nevertheless the master should be able to recover during a restart by 
> initiating a new SCP to fix the meta.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21844) Master could get stuck in initializing state while waiting for meta

2019-02-04 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760246#comment-16760246
 ] 

Sergey Shelukhin commented on HBASE-21844:
--

Looks good to me... I know this can happen when proc WAL is deleted, but in 
this case no manual intervention was done so the cluster got into this state on 
its own. +1 pending tests/etc



> Master could get stuck in initializing state while waiting for meta
> ---
>
> Key: HBASE-21844
> URL: https://issues.apache.org/jira/browse/HBASE-21844
> Project: HBase
>  Issue Type: Bug
>  Components: master, meta
>Affects Versions: 3.0.0
>Reporter: Bahram Chehrazy
>Assignee: Bahram Chehrazy
>Priority: Major
> Attachments: 
> 0001-HBASE-21844-Handling-incorrect-Meta-state-on-Zookeep.patch
>
>
> If the active master crashes after meta server dies, there is a slight chance 
> of master getting into a state where the ZK says meta is OPEN, but the server 
> is dead and there is no active SCP to recover it (perhaps the SCP has aborted 
> and the procWALs were corrupted). In this case the waitForMetaOnline never 
> returns.
>  
> We've seen this happening a few times when there had been a temporary HDFS 
> outage. Following log lines shows this state.
>  
> 2019-01-17 18:55:48,497 WARN  [master/:16000:becomeActiveMaster] 
> master.HMaster: hbase:meta,,1.1588230740 is NOT online; state=
> {1588230740 *state=*OPEN**, ts=1547780128227, 
> server=*,16020,1547776821322}
> ; *ServerCrashProcedures=false*. Master startup cannot progress, in 
> holding-pattern until region onlined.
>  
> I'm still investigating why and how to prevent getting into this bad state, 
> but nevertheless the master should be able to recover during a restart by 
> initiating a new SCP to fix the meta.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21844) Master could get stuck in initializing state while waiting for meta

2019-02-04 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-21844:
-
Status: Patch Available  (was: Open)

> Master could get stuck in initializing state while waiting for meta
> ---
>
> Key: HBASE-21844
> URL: https://issues.apache.org/jira/browse/HBASE-21844
> Project: HBase
>  Issue Type: Bug
>  Components: master, meta
>Affects Versions: 3.0.0
>Reporter: Bahram Chehrazy
>Assignee: Bahram Chehrazy
>Priority: Major
> Attachments: 
> 0001-HBASE-21844-Handling-incorrect-Meta-state-on-Zookeep.patch
>
>
> If the active master crashes after meta server dies, there is a slight chance 
> of master getting into a state where the ZK says meta is OPEN, but the server 
> is dead and there is no active SCP to recover it (perhaps the SCP has aborted 
> and the procWALs were corrupted). In this case the waitForMetaOnline never 
> returns.
>  
> We've seen this happening a few times when there had been a temporary HDFS 
> outage. Following log lines shows this state.
>  
> 2019-01-17 18:55:48,497 WARN  [master/:16000:becomeActiveMaster] 
> master.HMaster: hbase:meta,,1.1588230740 is NOT online; state=
> {1588230740 *state=*OPEN**, ts=1547780128227, 
> server=*,16020,1547776821322}
> ; *ServerCrashProcedures=false*. Master startup cannot progress, in 
> holding-pattern until region onlined.
>  
> I'm still investigating why and how to prevent getting into this bad state, 
> but nevertheless the master should be able to recover during a restart by 
> initiating a new SCP to fix the meta.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21844) Master could get stuck in initializing state while waiting for meta

2019-02-04 Thread Bahram Chehrazy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bahram Chehrazy updated HBASE-21844:

Attachment: 0001-HBASE-21844-Handling-incorrect-Meta-state-on-Zookeep.patch

> Master could get stuck in initializing state while waiting for meta
> ---
>
> Key: HBASE-21844
> URL: https://issues.apache.org/jira/browse/HBASE-21844
> Project: HBase
>  Issue Type: Bug
>  Components: master, meta
>Affects Versions: 3.0.0
>Reporter: Bahram Chehrazy
>Priority: Major
> Attachments: 
> 0001-HBASE-21844-Handling-incorrect-Meta-state-on-Zookeep.patch
>
>
> If the active master crashes after meta server dies, there is a slight chance 
> of master getting into a state where the ZK says meta is OPEN, but the server 
> is dead and there is no active SCP to recover it (perhaps the SCP has aborted 
> and the procWALs were corrupted). In this case the waitForMetaOnline never 
> returns.
>  
> We've seen this happening a few times when there had been a temporary HDFS 
> outage. Following log lines shows this state.
>  
> 2019-01-17 18:55:48,497 WARN  [master/:16000:becomeActiveMaster] 
> master.HMaster: hbase:meta,,1.1588230740 is NOT online; state=
> {1588230740 *state=*OPEN**, ts=1547780128227, 
> server=*,16020,1547776821322}
> ; *ServerCrashProcedures=false*. Master startup cannot progress, in 
> holding-pattern until region onlined.
>  
> I'm still investigating why and how to prevent getting into this bad state, 
> but nevertheless the master should be able to recover during a restart by 
> initiating a new SCP to fix the meta.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21844) Master could get stuck in initializing state while waiting for meta

2019-02-04 Thread Bahram Chehrazy (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760224#comment-16760224
 ] 

Bahram Chehrazy commented on HBASE-21844:
-

I've a tested and uploaded a patch for this.

> Master could get stuck in initializing state while waiting for meta
> ---
>
> Key: HBASE-21844
> URL: https://issues.apache.org/jira/browse/HBASE-21844
> Project: HBase
>  Issue Type: Bug
>  Components: master, meta
>Affects Versions: 3.0.0
>Reporter: Bahram Chehrazy
>Priority: Major
> Attachments: 
> 0001-HBASE-21844-Handling-incorrect-Meta-state-on-Zookeep.patch
>
>
> If the active master crashes after meta server dies, there is a slight chance 
> of master getting into a state where the ZK says meta is OPEN, but the server 
> is dead and there is no active SCP to recover it (perhaps the SCP has aborted 
> and the procWALs were corrupted). In this case the waitForMetaOnline never 
> returns.
>  
> We've seen this happening a few times when there had been a temporary HDFS 
> outage. Following log lines shows this state.
>  
> 2019-01-17 18:55:48,497 WARN  [master/:16000:becomeActiveMaster] 
> master.HMaster: hbase:meta,,1.1588230740 is NOT online; state=
> {1588230740 *state=*OPEN**, ts=1547780128227, 
> server=*,16020,1547776821322}
> ; *ServerCrashProcedures=false*. Master startup cannot progress, in 
> holding-pattern until region onlined.
>  
> I'm still investigating why and how to prevent getting into this bad state, 
> but nevertheless the master should be able to recover during a restart by 
> initiating a new SCP to fix the meta.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21844) Master could get stuck in initializing state while waiting for meta

2019-02-04 Thread Bahram Chehrazy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bahram Chehrazy updated HBASE-21844:

Description: 
If the active master crashes after meta server dies, there is a slight chance 
of master getting into a state where the ZK says meta is OPEN, but the server 
is dead and there is no active SCP to recover it (perhaps the SCP has aborted 
and the procWALs were corrupted). In this case the waitForMetaOnline never 
returns.

 

We've seen this happening a few times when there had been a temporary HDFS 
outage. Following log lines shows this state.

 

2019-01-17 18:55:48,497 WARN  [master/:16000:becomeActiveMaster] 
master.HMaster: hbase:meta,,1.1588230740 is NOT online; state=

{1588230740 *state=*OPEN**, ts=1547780128227, 
server=*,16020,1547776821322}

; *ServerCrashProcedures=false*. Master startup cannot progress, in 
holding-pattern until region onlined.

 

I'm still investigating why and how to prevent getting into this bad state, but 
nevertheless the master should be able to recover during a restart by 
initiating a new SCP to fix the meta.

 

 

  was:
If the active master crashes after meta server dies, there is a slight chance 
of master getting into a state where the ZK says meta is OPEN, but the server 
is dead and there is no active SCP to recover it (perhaps the SCP has aborted 
and the procWALs were corrupted). In this case the waitForMetaOnline never 
returns.

 

We've seen this happening a few times when there had been a temporary HDFS 
outage. Following log lines shows this state.

 

2019-01-17 18:55:48,497 WARN  [master/:16000:becomeActiveMaster] 
master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740 
*state=OPEN*, ts=1547780128227, server=*,16020,1547776821322}; 
*ServerCrashProcedures=false*. Master startup cannot progress, in 
holding-pattern until region onlined.

 

I'm still investigating why and how to prevent getting into this bad state, but 
nevertheless the master should be able to recover during a restart by 
initiating a new SCP to fix the meta.

 

 


> Master could get stuck in initializing state while waiting for meta
> ---
>
> Key: HBASE-21844
> URL: https://issues.apache.org/jira/browse/HBASE-21844
> Project: HBase
>  Issue Type: Bug
>  Components: master, meta
>Affects Versions: 3.0.0
>Reporter: Bahram Chehrazy
>Priority: Major
>
> If the active master crashes after meta server dies, there is a slight chance 
> of master getting into a state where the ZK says meta is OPEN, but the server 
> is dead and there is no active SCP to recover it (perhaps the SCP has aborted 
> and the procWALs were corrupted). In this case the waitForMetaOnline never 
> returns.
>  
> We've seen this happening a few times when there had been a temporary HDFS 
> outage. Following log lines shows this state.
>  
> 2019-01-17 18:55:48,497 WARN  [master/:16000:becomeActiveMaster] 
> master.HMaster: hbase:meta,,1.1588230740 is NOT online; state=
> {1588230740 *state=*OPEN**, ts=1547780128227, 
> server=*,16020,1547776821322}
> ; *ServerCrashProcedures=false*. Master startup cannot progress, in 
> holding-pattern until region onlined.
>  
> I'm still investigating why and how to prevent getting into this bad state, 
> but nevertheless the master should be able to recover during a restart by 
> initiating a new SCP to fix the meta.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21844) Master could get stuck in initializing state while waiting for meta

2019-02-04 Thread Bahram Chehrazy (JIRA)
Bahram Chehrazy created HBASE-21844:
---

 Summary: Master could get stuck in initializing state while 
waiting for meta
 Key: HBASE-21844
 URL: https://issues.apache.org/jira/browse/HBASE-21844
 Project: HBase
  Issue Type: Bug
  Components: master, meta
Affects Versions: 3.0.0
Reporter: Bahram Chehrazy


If the active master crashes after meta server dies, there is a slight chance 
of master getting into a state where the ZK says meta is OPEN, but the server 
is dead and there is no active SCP to recover it (perhaps the SCP has aborted 
and the procWALs were corrupted). In this case the waitForMetaOnline never 
returns.

 

We've seen this happening a few times when there had been a temporary HDFS 
outage. Following log lines shows this state.

 

2019-01-17 18:55:48,497 WARN  [master/:16000:becomeActiveMaster] 
master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740 
*state=OPEN*, ts=1547780128227, server=*,16020,1547776821322}; 
*ServerCrashProcedures=false*. Master startup cannot progress, in 
holding-pattern until region onlined.

 

I'm still investigating why and how to prevent getting into this bad state, but 
nevertheless the master should be able to recover during a restart by 
initiating a new SCP to fix the meta.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20662) Increasing space quota on a violated table does not remove SpaceViolationPolicy.DISABLE enforcement

2019-02-04 Thread Sakthi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760211#comment-16760211
 ] 

Sakthi commented on HBASE-20662:


lgtm!

> Increasing space quota on a violated table does not remove 
> SpaceViolationPolicy.DISABLE enforcement
> ---
>
> Key: HBASE-20662
> URL: https://issues.apache.org/jira/browse/HBASE-20662
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.0.0
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20662.master.001.patch, 
> HBASE-20662.master.002.patch, HBASE-20662.master.003.patch, 
> HBASE-20662.master.004.patch, HBASE-20662.master.004.patch, 
> HBASE-20662.master.005.patch, HBASE-20662.master.006.patch, 
> HBASE-20662.master.007.patch, HBASE-20662.master.008.patch, 
> HBASE-20662.master.008.patch, HBASE-20662.master.009.patch, 
> HBASE-20662.master.009.patch, HBASE-20662.master.010.patch, screenshot.png
>
>
> *Steps to reproduce*
>  * Create a table and set quota with {{SpaceViolationPolicy.DISABLE}} having 
> limit say 2MB
>  * Now put rows until space quota is violated and table gets disabled
>  * Next, increase space quota with limit say 4MB on the table
>  * Now try putting a row into the table
> {code:java}
>  private void testSetQuotaThenViolateAndFinallyIncreaseQuota() throws 
> Exception {
> SpaceViolationPolicy policy = SpaceViolationPolicy.DISABLE;
> Put put = new Put(Bytes.toBytes("to_reject"));
> put.addColumn(Bytes.toBytes(SpaceQuotaHelperForTests.F1), 
> Bytes.toBytes("to"),
>   Bytes.toBytes("reject"));
> // Do puts until we violate space policy
> final TableName tn = writeUntilViolationAndVerifyViolation(policy, put);
> // Now, increase limit
> setQuotaLimit(tn, policy, 4L);
> // Put some row now: should not violate as quota limit increased
> verifyNoViolation(policy, tn, put);
>   }
> {code}
> *Expected*
> We should be able to put data as long as newly set quota limit is not reached
> *Actual*
> We fail to put any new row even after increasing limit
> *Root cause*
> Increasing quota on a violated table triggers the table to be enabled, but 
> since the table is already in violation, the system does not allow it to be 
> enabled (may be thinking that a user is trying to enable it)
> *Relevant exception trace*
> {noformat}
> 2018-05-31 00:34:27,563 INFO  [regionserver/root1-ThinkPad-T440p:0.Chore.1] 
> client.HBaseAdmin$14(844): Started enable of 
> testSetQuotaAndThenIncreaseQuotaWithDisable0
> 2018-05-31 00:34:27,571 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=42525] 
> ipc.CallRunner(142): callId: 11 service: MasterService methodName: 
> EnableTable size: 104 connection: 127.0.0.1:38030 deadline: 1527707127568, 
> exception=org.apache.hadoop.hbase.security.AccessDeniedException: Enabling 
> the table 'testSetQuotaAndThenIncreaseQuotaWithDisable0' is disallowed due to 
> a violated space quota.
> 2018-05-31 00:34:27,571 ERROR [regionserver/root1-ThinkPad-T440p:0.Chore.1] 
> quotas.RegionServerSpaceQuotaManager(210): Failed to disable space violation 
> policy for testSetQuotaAndThenIncreaseQuotaWithDisable0. This table will 
> remain in violation.
> org.apache.hadoop.hbase.security.AccessDeniedException: 
> org.apache.hadoop.hbase.security.AccessDeniedException: Enabling the table 
> 'testSetQuotaAndThenIncreaseQuotaWithDisable0' is disallowed due to a 
> violated space quota.
>   at org.apache.hadoop.hbase.master.HMaster$6.run(HMaster.java:2275)
>   at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131)
>   at org.apache.hadoop.hbase.master.HMaster.enableTable(HMaster.java:2258)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.enableTable(MasterRpcServices.java:725)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> 

[jira] [Commented] (HBASE-21800) RegionServer aborted due to NPE from MetaTableMetrics coprocessor

2019-02-04 Thread Sakthi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760208#comment-16760208
 ] 

Sakthi commented on HBASE-21800:


Ping [~apurtell], [~Apache9] anything else pending here? :)

> RegionServer aborted due to NPE from MetaTableMetrics coprocessor
> -
>
> Key: HBASE-21800
> URL: https://issues.apache.org/jira/browse/HBASE-21800
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, meta, metrics, Operability
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Critical
>  Labels: Meta
> Attachments: hbase-21800.master.001.patch, 
> hbase-21800.master.002.patch, hbase-21800.master.003.patch
>
>
> I was just playing around the code, trying to capture "Top k" table metrics 
> from MetaMetrics, when I bumped into this issue. Though currently we are not 
> capturing "Top K" table metrics, but we can encounter this issue because of 
> the "Top k Clients" that is implemented using the LossyAlgo.
>  
> RegionServer gets aborted due to a NPE from MetaTableMetrics coprocessor. The 
> log looks somewhat like this:
> {code:java}
> 2019-01-28 23:31:10,311 ERROR 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> coprocessor.CoprocessorHost: The coprocessor 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw 
> java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.markMeterIfPresent(MetaTableMetrics.java:123)
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.tableMetricRegisterAndMark2(MetaTableMetrics.java:233)
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.preGetOp(MetaTableMetrics.java:82)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:840)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:837)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:551)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:625)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preGet(RegionCoprocessorHost.java:837)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2608)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2547)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> 2019-01-28 23:31:10,314 ERROR 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16020] 
> regionserver.HRegionServer: * ABORTING region server 
> 10.0.0.24,16020,1548747043814: The coprocessor 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics threw 
> java.lang.NullPointerException *
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.markMeterIfPresent(MetaTableMetrics.java:123)
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.tableMetricRegisterAndMark2(MetaTableMetrics.java:233)
>   at 
> org.apache.hadoop.hbase.coprocessor.MetaTableMetrics$ExampleRegionObserverMeta.preGetOp(MetaTableMetrics.java:82)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:840)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$19.call(RegionCoprocessorHost.java:837)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:551)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:625)
>   at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preGet(RegionCoprocessorHost.java:837)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2608)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2547)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41998)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
>   at 

[jira] [Commented] (HBASE-21804) Remove 0.94 check from the Linkchecker job

2019-02-04 Thread Sakthi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760203#comment-16760203
 ] 

Sakthi commented on HBASE-21804:


[~psomogyi], yes that would well ! :)

> Remove 0.94 check from the Linkchecker job
> --
>
> Key: HBASE-21804
> URL: https://issues.apache.org/jira/browse/HBASE-21804
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Major
> Attachments: hbase-21804.master.001.patch
>
>
> This is a pretty old release. Even though we don't have the link to the doc 
> from our main page, the linkchecker job lands directly at 
> [https://hbase.apache.org/0.94/] which has around 90 odd missing file issues. 
> I haven't yet looked at the missing anchors stuff yet.
> We can set linkchecker to not check 0.94.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21817) handle corrupted cells like other corrupted WAL cases

2019-02-04 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-21817:
-
Summary: handle corrupted cells like other corrupted WAL cases  (was: skip 
records with corrupted cells in WAL splitting)

> handle corrupted cells like other corrupted WAL cases
> -
>
> Key: HBASE-21817
> URL: https://issues.apache.org/jira/browse/HBASE-21817
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Critical
> Attachments: HBASE-21817.01.patch, HBASE-21817.patch
>
>
> See HBASE-21601 for context.
> I looked at the code a bit but it will take a while to understand, so for now 
> I'm going to mitigate it by skipping such records. Given that this record is 
> bogus, and the lengths are intact, for this scenario it's safe to do so. 
> However, it's possible I guess to have a bug where skipping such record would 
> lead to data loss. Regardless, failure to split the WAL will lead to even 
> more data loss in this case so it should be ok to handle errors where the 
> structure is correct but cells are corrupted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21817) skip records with corrupted cells in WAL splitting

2019-02-04 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760197#comment-16760197
 ] 

Sergey Shelukhin edited comment on HBASE-21817 at 2/4/19 8:42 PM:
--

Upon further examination we already have a path for handling corrupted WALs so 
I will just use that.
The above could still be done in a separate JIRA ("splitting" the WAL in HBCK 
to separate good and bad records), however this case is rather obscure so I'm 
not sure it's worth it, unless we have a release without the fix to the 
original issue that writes such WAL.


was (Author: sershe):
Upon further examination we already have a path for handling corrupted WALs so 
I will just use that.
The above could still be done in a separate JIRA ("splitting" the WAL in HBCK 
to separate good and bad records), however this case is rather obscure so I'm 
not sure it's worth it. 

> skip records with corrupted cells in WAL splitting
> --
>
> Key: HBASE-21817
> URL: https://issues.apache.org/jira/browse/HBASE-21817
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Critical
> Attachments: HBASE-21817.01.patch, HBASE-21817.patch
>
>
> See HBASE-21601 for context.
> I looked at the code a bit but it will take a while to understand, so for now 
> I'm going to mitigate it by skipping such records. Given that this record is 
> bogus, and the lengths are intact, for this scenario it's safe to do so. 
> However, it's possible I guess to have a bug where skipping such record would 
> lead to data loss. Regardless, failure to split the WAL will lead to even 
> more data loss in this case so it should be ok to handle errors where the 
> structure is correct but cells are corrupted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21817) skip records with corrupted cells in WAL splitting

2019-02-04 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760197#comment-16760197
 ] 

Sergey Shelukhin commented on HBASE-21817:
--

Upon further examination we already have a path for handling corrupted WALs so 
I will just use that.
The above could still be done in a separate JIRA ("splitting" the WAL in HBCK 
to separate good and bad records), however this case is rather obscure so I'm 
not sure it's worth it. 

> skip records with corrupted cells in WAL splitting
> --
>
> Key: HBASE-21817
> URL: https://issues.apache.org/jira/browse/HBASE-21817
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Critical
> Attachments: HBASE-21817.01.patch, HBASE-21817.patch
>
>
> See HBASE-21601 for context.
> I looked at the code a bit but it will take a while to understand, so for now 
> I'm going to mitigate it by skipping such records. Given that this record is 
> bogus, and the lengths are intact, for this scenario it's safe to do so. 
> However, it's possible I guess to have a bug where skipping such record would 
> lead to data loss. Regardless, failure to split the WAL will lead to even 
> more data loss in this case so it should be ok to handle errors where the 
> structure is correct but cells are corrupted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21817) skip records with corrupted cells in WAL splitting

2019-02-04 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-21817:
-
Attachment: HBASE-21817.01.patch

> skip records with corrupted cells in WAL splitting
> --
>
> Key: HBASE-21817
> URL: https://issues.apache.org/jira/browse/HBASE-21817
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Critical
> Attachments: HBASE-21817.01.patch, HBASE-21817.patch
>
>
> See HBASE-21601 for context.
> I looked at the code a bit but it will take a while to understand, so for now 
> I'm going to mitigate it by skipping such records. Given that this record is 
> bogus, and the lengths are intact, for this scenario it's safe to do so. 
> However, it's possible I guess to have a bug where skipping such record would 
> lead to data loss. Regardless, failure to split the WAL will lead to even 
> more data loss in this case so it should be ok to handle errors where the 
> structure is correct but cells are corrupted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21843) AM misses region assignment in catastrophic scenarios where RS assigned to the region in Meta does not have a WAL dir.

2019-02-04 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760188#comment-16760188
 ] 

stack commented on HBASE-21843:
---

Good one [~wchevreuil].

The SCP doesn't get rerun? The step w/ meta assign doesn't get rerun because it 
hasn't completed yet?

> AM misses region assignment in catastrophic scenarios where RS assigned to 
> the region in Meta does not have a WAL dir.
> --
>
> Key: HBASE-21843
> URL: https://issues.apache.org/jira/browse/HBASE-21843
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: HBASE-21843.master.001.patch
>
>
> A bit unusual, but managed to face this twice lately on both distributed and 
> local standalone mode, on VMs. Somehow, after some VM pause/resume, got into 
> a situation where regions on meta were assigned to a give RS startcode that 
> had no corresponding WAL dir.
> That caused those regions to never get assigned, because the given RS 
> startcode is not found anywhere by RegionServerTracker/ServerManager, so no 
> SCP is created to this RS startcode, leaving the region "open" on a dead 
> server forever, in META.
> Could get this sorted by adding extra check on loadMeta, checking if the RS 
> assigned to the region in meta is not online and doesn't have a WAL dir, then 
> mark this region as offline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21843) AM misses region assignment in catastrophic scenarios where RS assigned to the region in Meta does not have a WAL dir.

2019-02-04 Thread Wellington Chevreuil (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HBASE-21843:
-
Status: Patch Available  (was: Open)

> AM misses region assignment in catastrophic scenarios where RS assigned to 
> the region in Meta does not have a WAL dir.
> --
>
> Key: HBASE-21843
> URL: https://issues.apache.org/jira/browse/HBASE-21843
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.1.0, 3.0.0, 2.2.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: HBASE-21843.master.001.patch
>
>
> A bit unusual, but managed to face this twice lately on both distributed and 
> local standalone mode, on VMs. Somehow, after some VM pause/resume, got into 
> a situation where regions on meta were assigned to a give RS startcode that 
> had no corresponding WAL dir.
> That caused those regions to never get assigned, because the given RS 
> startcode is not found anywhere by RegionServerTracker/ServerManager, so no 
> SCP is created to this RS startcode, leaving the region "open" on a dead 
> server forever, in META.
> Could get this sorted by adding extra check on loadMeta, checking if the RS 
> assigned to the region in meta is not online and doesn't have a WAL dir, then 
> mark this region as offline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21843) AM misses region assignment in catastrophic scenarios where RS assigned to the region in Meta does not have a WAL dir.

2019-02-04 Thread Wellington Chevreuil (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760133#comment-16760133
 ] 

Wellington Chevreuil commented on HBASE-21843:
--

Looked at the server logs to try understand it. Here's my conclusion on what 
led to that state:
1) SCP processes log split before Assign the regions that were on the crashed 
server;
2) While doing log split, it first renamed the WAL dir to add "-splitting" 
suffix, then it didn't find any files on that WAL dir and removed the dir. At 
this point, there was no WAL dir for RS1-T1 anymore.
3) SCP continues to SERVER_CRASH_ASSIGN. It all goes well, but just before 
updating meta with the new RS assignment, hdfs enters safemode, the meta update 
fails, whole hbase cluster crashes. Now we have meta still with the original 
RS1-T1 assigned, but there's no more WAL dir for it.

> AM misses region assignment in catastrophic scenarios where RS assigned to 
> the region in Meta does not have a WAL dir.
> --
>
> Key: HBASE-21843
> URL: https://issues.apache.org/jira/browse/HBASE-21843
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: HBASE-21843.master.001.patch
>
>
> A bit unusual, but managed to face this twice lately on both distributed and 
> local standalone mode, on VMs. Somehow, after some VM pause/resume, got into 
> a situation where regions on meta were assigned to a give RS startcode that 
> had no corresponding WAL dir.
> That caused those regions to never get assigned, because the given RS 
> startcode is not found anywhere by RegionServerTracker/ServerManager, so no 
> SCP is created to this RS startcode, leaving the region "open" on a dead 
> server forever, in META.
> Could get this sorted by adding extra check on loadMeta, checking if the RS 
> assigned to the region in meta is not online and doesn't have a WAL dir, then 
> mark this region as offline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21837) Potential race condition when WALSplitter writes the split results

2019-02-04 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760115#comment-16760115
 ] 

Sergey Shelukhin commented on HBASE-21837:
--

[~bahramch] the exception in the description has a caused-by
{noformat}
Caused by: java.lang.NegativeArraySizeException
at org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113)
at 
org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542)
at 
org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586)
{noformat}
This is caused by input cell being corrupted, so the column families cannot be 
retrieved... it's another instance of the same corrupted-WAL bug I think. So 
the callstack may be unrelated

> Potential race condition when WALSplitter writes the split results
> --
>
> Key: HBASE-21837
> URL: https://issues.apache.org/jira/browse/HBASE-21837
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 3.0.0
>Reporter: Bahram Chehrazy
>Priority: Major
>
> When WALSplitter writes the split buffer, it calls 
> EntryBuffers.getChunkToWrite in WriterThread.doRun. But getChunkToWrite is 
> not thread safe, and could return garbage when called in parallel. Later when 
> it tries to write the chunk using writeBuffer it could throw an exception 
> like this:
>  
> 2018-12-13 17:01:12,208 ERROR [RS_LOG_REPLAY_OPS-regionserver/...] 
> executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY 
> java.lang.RuntimeException: java.lang.NegativeArraySizeException at 
> org.apache.hadoop.hbase.wal.WALSplitter$PipelineController.checkForErrors(WALSplitter.java:846)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$OutputSink.finishWriting(WALSplitter.java:1203)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(WALSplitter.java:1267)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:349) at 
> org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:196) at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.splitLog(SplitLogWorker.java:178)
>  at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.lambda$new$0(SplitLogWorker.java:90)
>  at 
> org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:70)
>  at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745) Caused by: 
> java.lang.NegativeArraySizeException at 
> org.apache.hadoop.hbase.CellUtil.cloneFamily(CellUtil.java:113) at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.filterCellByStore(WALSplitter.java:1542)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1586)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1560)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1085)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1077)
>  at 
> org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1047)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21811) region can be opened on two servers due to race condition with procedures and server reports

2019-02-04 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760112#comment-16760112
 ] 

Sergey Shelukhin commented on HBASE-21811:
--

Hmm... the patch that got committed doesn't have the server check in the 
procedure waking.
I think this check is still needed - in our case, the procedure was waiting for 
an event, just from a different server, so it would still wake it up, right?

> region can be opened on two servers due to race condition with procedures and 
> server reports
> 
>
> Key: HBASE-21811
> URL: https://issues.apache.org/jira/browse/HBASE-21811
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 3.0.0
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.3.0
>
> Attachments: HBASE-21811-UT.patch, HBASE-21811-v1.patch, 
> HBASE-21811-v2.patch, HBASE-21811.patch
>
>
> Looks like the region server responses are being processed incorrectly in 
> places allowing te region to be opened on two servers.
> * The region server report handling in procedures should check which server 
> is reporting.
> * Also although I didn't check (and it isn't implicated in this bug), RS must 
> check in OPEN that it's actually the correct RS master sent open to (w.r.t. 
> start timestamp)
> This was previosuly "mitigated" by master killing the RS with incorrect 
> reports, but due to race conditions with reports and assignment the kill was 
> replaced with a warning, so now this condition persists.
> Regardless, the kill approach is not a good fix because there's still a 
> window when a region can be opened on two servers.
> A region is being opened by server_48c. The server dies, and we process the 
> retry correctly (retry=3 because 2 previous similar open failures were 
> processed correctly).
> We start opening it on server_1aa now.
> {noformat}
> 2019-01-28 18:12:09,862 INFO  [KeepAlivePEWorker-104] 
> assignment.RegionStateStore: pid=4915 updating hbase:meta 
> row=8be2a423b16471b9417f0f7de04281c6, regionState=ABNORMALLY_CLOSED
> 2019-01-28 18:12:09,862 INFO  [KeepAlivePEWorker-104] 
> procedure.ServerCrashProcedure: pid=11944, 
> state=RUNNABLE:SERVER_CRASH_ASSIGN, hasLock=true; ServerCrashProcedure 
> server=server_48c,17020,1548726406632, splitWal=true, meta=false found RIT 
> pid=4915, ppid=7, state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED, 
> hasLock=true; TransitRegionStateProcedure table=table, 
> region=8be2a423b16471b9417f0f7de04281c6, ASSIGN; rit=OPENING, 
> location=server_48c,17020,1548726406632, table=table, 
> region=8be2a423b16471b9417f0f7de04281c6
> 2019-01-28 18:12:10,778 INFO  [KeepAlivePEWorker-80] 
> assignment.TransitRegionStateProcedure: Retry=3 of max=2147483647; pid=4915, 
> ppid=7, state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, hasLock=true; 
> TransitRegionStateProcedure table=table, 
> region=8be2a423b16471b9417f0f7de04281c6, ASSIGN; rit=ABNORMALLY_CLOSED, 
> location=null
> ...
> 2019-01-28 18:12:10,902 INFO  [KeepAlivePEWorker-80] 
> assignment.TransitRegionStateProcedure: Starting pid=4915, ppid=7, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true; 
> TransitRegionStateProcedure table=table, 
> region=8be2a423b16471b9417f0f7de04281c6, ASSIGN; rit=ABNORMALLY_CLOSED, 
> location=null; forceNewPlan=true, retain=false
> 2019-01-28 18:12:11,114 INFO  [PEWorker-7] assignment.RegionStateStore: 
> pid=4915 updating hbase:meta row=8be2a423b16471b9417f0f7de04281c6, 
> regionState=OPENING, regionLocation=server_1aa,17020,1548727658713
> {noformat}
> However, we get the remote procedure failure from 48c after we've already 
> started that.
> It actually tried to open on the restarted RS, which makes me wonder if this 
> is safe also w.r.t. other races - what if RS already initialized and didn't 
> error out?
> Need to check if we verify the start code expected by master on RS when 
> opening.
> {noformat}
> 2019-01-28 18:12:12,179 WARN  [RSProcedureDispatcher-pool4-t362] 
> assignment.RegionRemoteProcedureBase: The remote operation pid=11050, 
> ppid=4915, state=SUCCESS, hasLock=false; 
> org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure for region 
> {ENCODED => 8be2a423b16471b9417f0f7de04281c6 ... to server 
> server_48c,17020,1548726406632 failed
> org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: 
> org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server 
> server_48c,17020,1548727752747 is not running yet
> 2019-01-28 18:12:12,179 WARN  [RSProcedureDispatcher-pool4-t362] 
> procedure.RSProcedureDispatcher: server server_48c,17020,1548726406632 is not 
> up for a while; try a new one
> {noformat}
> Without any other reason (at 

[jira] [Commented] (HBASE-19616) Review of LogCleaner Class

2019-02-04 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760100#comment-16760100
 ] 

stack commented on HBASE-19616:
---

The javac complaints are minor

[WARNING] 
/testptch/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/security/visibility/VisibilityUtils.java:[181,59]
 [StringSplitter] Prefer Splitter to String.split
[WARNING] 
/testptch/hbase/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java:[169,46]
 [StringSplitter] Prefer Splitter to String.split

No need of this check anymore:

if (context != null) {

This is really nice cleanup.

> Review of LogCleaner Class
> --
>
> Key: HBASE-19616
> URL: https://issues.apache.org/jira/browse/HBASE-19616
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HBASE-19616.1.patch, HBASE-19616.2.patch, 
> HBASE-19616.3.patch
>
>
> * Parameterize logging
> * Remove compiler-reported dead code to re-enabling useful logging
> * Use ArrayList instead of LinkedList
> Used a CountDownLatch to replace a bunch of the existing code. It currently 
> loops with a 500ms interval to check if some sort of condition has been met 
> until the amount of time spent looping is greater than some timeout value. 
> Using a CountDownLatch allows one or more threads to wait until a set of 
> operations being performed in other threads completes. It will not blindly 
> sleep between checks and it will return immediately after the condition is 
> met. This removes the HBase configuration that controls the sleep interval.
>  
> I also cleaned up the unit tests a bit and enhanced the logging of this class 
> to ease troubleshooting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20662) Increasing space quota on a violated table does not remove SpaceViolationPolicy.DISABLE enforcement

2019-02-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760053#comment-16760053
 ] 

Hadoop QA commented on HBASE-20662:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
27s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 1s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} The patch passed checkstyle in hbase-client {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} hbase-server: The patch generated 0 new + 148 
unchanged - 4 fixed = 148 total (was 152) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 2s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 31s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
2s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}220m 37s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}267m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.master.locking.TestLockManager |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-20662 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12957504/HBASE-20662.master.010.patch
 |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux b7ff88418ea1 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 GNU/Linux |
| 

[jira] [Commented] (HBASE-19616) Review of LogCleaner Class

2019-02-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760034#comment-16760034
 ] 

Hadoop QA commented on HBASE-19616:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 6s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
51s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  1m 51s{color} 
| {color:red} hbase-server generated 2 new + 186 unchanged - 2 fixed = 188 
total (was 188) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
 8s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 42s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}129m 
27s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}166m 57s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-19616 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12957510/HBASE-19616.3.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 83876c3e3a6a 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 5f8bdd52a1 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| javac | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15873/artifact/patchprocess/diff-compile-javac-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15873/testReport/ |
| Max. process+thread count | 5087 (vs. ulimit of 1) |
| modules | C: hbase-server U: hbase-server |
| Console output | 

[jira] [Commented] (HBASE-21201) Support to run VerifyReplication MR tool without peerid

2019-02-04 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760023#comment-16760023
 ] 

Josh Elser commented on HBASE-21201:


{quote}if we don't satisfy the assert condition (this.peerQuorumAddress != 
null), that's a bug. I usually use "assert" when I want to check an internal 
logic, and use an Exception when checking arguments or something outside. What 
do you think? we don't need it?
{quote}
I see. I was thinking that this could happen via user-input. I'm OK if you want 
to restore that assert on commit.

The only other thing I'm noticing is that we could stand to benefit from a UT 
that validates we expect at least two options. Not something you have to add 
here, but if you're looking to do some more clean-up here (checkstyle fixing), 
that'd be a nice addition to also make ;)

+1 to commit v2

> Support to run VerifyReplication MR tool without peerid
> ---
>
> Key: HBASE-21201
> URL: https://issues.apache.org/jira/browse/HBASE-21201
> Project: HBase
>  Issue Type: Improvement
>  Components: hbase-operator-tools
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Sujit P
>Assignee: Toshihiro Suzuki
>Priority: Major
> Attachments: HBASE-21201.master.001.patch, 
> HBASE-21201.master.002.patch
>
>
> In some use cases, hbase clients writes to separate clusters(probably 
> different datacenters) tables for redundancy. As an administrator/application 
> architect, I would like to find out if both cluster tables are in the same 
> state (cell by cell). One of the tools that is readily available to use is 
> VerifyRep which is part of replication.
> However, it requires peerId to be setup on atleast of the involved cluster. 
> PeerId is unnecessary in this use-case scenario and possibly cause unintended 
> consequences as the clusters aren't really replication peers neither do We 
> prefer them to be.
> Looking at the code:
> Tool attempts to get only the clusterKey which is essentially ZooKeeper 
> quorum url
>  
> {code:java}
> //VerifyReplication.java
> private static Pair 
> getPeerQuorumConfig(final Configuration conf, String peerId)
> .
> .
> return Pair.newPair(peerConfig,
>         ReplicationUtils.getPeerClusterConfiguration(peerConfig, conf));
> //ReplicationUtils.java
> public static Configuration getPeerClusterConfiguration(ReplicationPeerConfig 
> peerConfig, Configuration baseConf) throws ReplicationException {
> Configuration otherConf;
> try {
> otherConf = HBaseConfiguration.createClusterConf(baseConf, 
> peerConfig.getClusterKey());{code}
>  
>  
> So I would like to propose to update the tool to pass the remote cluster 
> ZkQuorum as an argument (ex. --peerQuorumAddress 
> clusterBzk1,clusterBzk2,clusterBzk3:2181/hbase-secure ) and use it 
> effectively without dependence on replication peerId, similar to 
> peerFSAddress. The are certain advantages in doing so as follows:
>  * Reduce the development/maintenance of separate tool for above scenario
>  * Allow the tool to be more useful for other scenarios as well such as 
>  ** validating backups in remote cluster HBASE-19106
>  ** compare cloned tableA and original tableA in same/remote cluster incase 
> of user error before restoring snapshot to original table to find the records 
> that need to be added/invalid/missing etc
>  ** Allow backup operators who are non-Hbase admins(who shouldn't be adding 
> the peerId) to run the tool, since currently only Hbase superuser can add a 
> peerId for reasons discussed in HBASE-21163.
> Please post your comments
> Thanks
> cc: [~clayb], [~brfrn169] , [~vrodionov] , [~rashidaligee]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21840) TestHRegionWithInMemoryFlush fails with NPE

2019-02-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759992#comment-16759992
 ] 

Hudson commented on HBASE-21840:


Results for branch branch-2.1
[build #832 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/832/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/832//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/832//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/832//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> TestHRegionWithInMemoryFlush fails with NPE
> ---
>
> Key: HBASE-21840
> URL: https://issues.apache.org/jira/browse/HBASE-21840
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5, 2.3.0
>
> Attachments: HBASE-21840.patch
>
>
> Found this one when testing 2.1.3.
> {noformat}
> Exception in thread "PutThread" java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore.checkActiveSize(CompactingMemStore.java:392)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.internalAdd(AbstractMemStore.java:307)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:132)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:112)
> at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:750)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.applyToMemStore(HRegion.java:4420)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.access$500(HRegion.java:226)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.applyFamilyMapToMemStore(HRegion.java:3479)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lambda$writeMiniBatchOperationsToMemStore$0(HRegion.java:3170)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.visitBatchOperations(HRegion.java:3103)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3162)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3644)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4058)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3991)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3922)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3913)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3927)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4254)
> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3046)
> {noformat}
> And later the test is stuck, since the MVCC can not be advanced any more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21843) AM misses region assignment in catastrophic scenarios where RS assigned to the region in Meta does not have a WAL dir.

2019-02-04 Thread Wellington Chevreuil (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759972#comment-16759972
 ] 

Wellington Chevreuil commented on HBASE-21843:
--

{quote}This means we have already scheduled a SCP for RS1-T1 and it has already 
finished?
{quote}
I guess not totally finished, while VMs were getting shutdown, because meta 
still had R1 assigned to RS1-T1 when the cluster was up again. So it seems, 
under a catastrophic failure, we do have a scenario where WAL dir can be 
deleted before meta is properly updated. 

> AM misses region assignment in catastrophic scenarios where RS assigned to 
> the region in Meta does not have a WAL dir.
> --
>
> Key: HBASE-21843
> URL: https://issues.apache.org/jira/browse/HBASE-21843
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: HBASE-21843.master.001.patch
>
>
> A bit unusual, but managed to face this twice lately on both distributed and 
> local standalone mode, on VMs. Somehow, after some VM pause/resume, got into 
> a situation where regions on meta were assigned to a give RS startcode that 
> had no corresponding WAL dir.
> That caused those regions to never get assigned, because the given RS 
> startcode is not found anywhere by RegionServerTracker/ServerManager, so no 
> SCP is created to this RS startcode, leaving the region "open" on a dead 
> server forever, in META.
> Could get this sorted by adding extra check on loadMeta, checking if the RS 
> assigned to the region in meta is not online and doesn't have a WAL dir, then 
> mark this region as offline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21840) TestHRegionWithInMemoryFlush fails with NPE

2019-02-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759971#comment-16759971
 ] 

Hudson commented on HBASE-21840:


Results for branch branch-2.0
[build #1316 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1316/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1316//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1316//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1316//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> TestHRegionWithInMemoryFlush fails with NPE
> ---
>
> Key: HBASE-21840
> URL: https://issues.apache.org/jira/browse/HBASE-21840
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5, 2.3.0
>
> Attachments: HBASE-21840.patch
>
>
> Found this one when testing 2.1.3.
> {noformat}
> Exception in thread "PutThread" java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore.checkActiveSize(CompactingMemStore.java:392)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.internalAdd(AbstractMemStore.java:307)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:132)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:112)
> at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:750)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.applyToMemStore(HRegion.java:4420)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.access$500(HRegion.java:226)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.applyFamilyMapToMemStore(HRegion.java:3479)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lambda$writeMiniBatchOperationsToMemStore$0(HRegion.java:3170)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.visitBatchOperations(HRegion.java:3103)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3162)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3644)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4058)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3991)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3922)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3913)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3927)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4254)
> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3046)
> {noformat}
> And later the test is stuck, since the MVCC can not be advanced any more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21843) AM misses region assignment in catastrophic scenarios where RS assigned to the region in Meta does not have a WAL dir.

2019-02-04 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759967#comment-16759967
 ] 

Duo Zhang commented on HBASE-21843:
---

{quote}
meta still has region R1 assigned to RS1-T1 in meta, but there's no RS1-T1 WAL 
dir.
{quote}

This means we have already scheduled a SCP for RS1-T1 and it has already 
finished?


> AM misses region assignment in catastrophic scenarios where RS assigned to 
> the region in Meta does not have a WAL dir.
> --
>
> Key: HBASE-21843
> URL: https://issues.apache.org/jira/browse/HBASE-21843
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: HBASE-21843.master.001.patch
>
>
> A bit unusual, but managed to face this twice lately on both distributed and 
> local standalone mode, on VMs. Somehow, after some VM pause/resume, got into 
> a situation where regions on meta were assigned to a give RS startcode that 
> had no corresponding WAL dir.
> That caused those regions to never get assigned, because the given RS 
> startcode is not found anywhere by RegionServerTracker/ServerManager, so no 
> SCP is created to this RS startcode, leaving the region "open" on a dead 
> server forever, in META.
> Could get this sorted by adding extra check on loadMeta, checking if the RS 
> assigned to the region in meta is not online and doesn't have a WAL dir, then 
> mark this region as offline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21843) AM misses region assignment in catastrophic scenarios where RS assigned to the region in Meta does not have a WAL dir.

2019-02-04 Thread Wellington Chevreuil (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759961#comment-16759961
 ] 

Wellington Chevreuil commented on HBASE-21843:
--

Main issue is how the RS instance ended up with no WAL dir. I've seen this 
happen when my VMs cluster was running got paused/resumed. Cluster was running, 
region R1 was assigned in meta to say, RS1-TS1. Several host crashes/restarts 
later, cluster came back on, now say RS1-TS5, meta still has region R1 assigned 
to RS1-T1 in meta, but there's no RS1-T1 WAL dir. In this case, R1 is not added 
to the list of offline regions during meta load (because it's state is open), 
and there's never a SCP triggered for RS1-T1 (because 
RegionServerTracker/ServerManaer don't find any RS1-T1 anywhere), so R1 will be 
forever open on RS1-T1 in meta, never actually getting assigned to a live RS.

> AM misses region assignment in catastrophic scenarios where RS assigned to 
> the region in Meta does not have a WAL dir.
> --
>
> Key: HBASE-21843
> URL: https://issues.apache.org/jira/browse/HBASE-21843
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: HBASE-21843.master.001.patch
>
>
> A bit unusual, but managed to face this twice lately on both distributed and 
> local standalone mode, on VMs. Somehow, after some VM pause/resume, got into 
> a situation where regions on meta were assigned to a give RS startcode that 
> had no corresponding WAL dir.
> That caused those regions to never get assigned, because the given RS 
> startcode is not found anywhere by RegionServerTracker/ServerManager, so no 
> SCP is created to this RS startcode, leaving the region "open" on a dead 
> server forever, in META.
> Could get this sorted by adding extra check on loadMeta, checking if the RS 
> assigned to the region in meta is not online and doesn't have a WAL dir, then 
> mark this region as offline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21819) Generate CHANGES.md and RELEASENOTES.md for 2.1.3

2019-02-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759959#comment-16759959
 ] 

Hudson commented on HBASE-21819:


Results for branch branch-2.1
[build #831 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/831/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/831//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/831//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/831//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Generate CHANGES.md and RELEASENOTES.md for 2.1.3
> -
>
> Key: HBASE-21819
> URL: https://issues.apache.org/jira/browse/HBASE-21819
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, release
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.1.3
>
> Attachments: HBASE-21819-branch-2.1-addendum.patch, 
> HBASE-21819-branch-2.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21816) Print source cluster replication config directory

2019-02-04 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-21816:
---
Priority: Trivial  (was: Major)

> Print source cluster replication config directory
> -
>
> Key: HBASE-21816
> URL: https://issues.apache.org/jira/browse/HBASE-21816
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 3.0.0, 2.0.0
> Environment: NA
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HBASE-21816-001.patch, HBASE-21816-002.patch, 
> HBASE-21816-003.patch
>
>
> User may get confused, to understanding our HBase configurations which are 
> loaded for replication. Sometimes, User may place source and destination 
> cluster conf under "/etc/hbase/conf" directory. It will create uncertainty 
> because our log points that all the configurations are co-located.
>  
> Existing Logs, 
> {code:java}
> INFO  [RpcServer.FifoWFPBQ.replication.handler=2,queue=0,port=16020] 
> regionserver.DefaultSourceFSConfigurationProvider: Loading source cluster 
> HDP1 file system configurations from xml files under directory 
> /etc/hbase/conf/
> {code}
> But it should be something like,
> {code:java}
> INFO  [RpcServer.FifoWFPBQ.replication.handler=2,queue=0,port=16020] 
> regionserver.DefaultSourceFSConfigurationProvider: Loading source cluster 
> HDP1 file system configurations from xml files under directory 
> /etc/hbase/conf/HDP1
> {code}
>  
> This jira only to change the log-line, no issue with the functionality. 
> {code:java}
> File confDir = new File(replicationConfDir, replicationClusterId);
> String[] listofConfFiles = FileUtil.list(confDir);
> for (String confFile : listofConfFiles) {
> if (new File(confDir, confFile).isFile() && confFile.endsWith(XML)) {
> // Add all the user provided client conf files
> sourceClusterConf.addResource(new Path(confDir.getPath(), confFile));
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21843) AM misses region assignment in catastrophic scenarios where RS assigned to the region in Meta does not have a WAL dir.

2019-02-04 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759952#comment-16759952
 ] 

Duo Zhang commented on HBASE-21843:
---

I think we need to find out the root cause? When will we assign a region to a 
rs when it does not have a WAL dir yet? I think creating the WAL dir is part of 
the region server initialization?

> AM misses region assignment in catastrophic scenarios where RS assigned to 
> the region in Meta does not have a WAL dir.
> --
>
> Key: HBASE-21843
> URL: https://issues.apache.org/jira/browse/HBASE-21843
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: HBASE-21843.master.001.patch
>
>
> A bit unusual, but managed to face this twice lately on both distributed and 
> local standalone mode, on VMs. Somehow, after some VM pause/resume, got into 
> a situation where regions on meta were assigned to a give RS startcode that 
> had no corresponding WAL dir.
> That caused those regions to never get assigned, because the given RS 
> startcode is not found anywhere by RegionServerTracker/ServerManager, so no 
> SCP is created to this RS startcode, leaving the region "open" on a dead 
> server forever, in META.
> Could get this sorted by adding extra check on loadMeta, checking if the RS 
> assigned to the region in meta is not online and doesn't have a WAL dir, then 
> mark this region as offline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21843) AM misses region assignment in catastrophic scenarios where RS assigned to the region in Meta does not have a WAL dir.

2019-02-04 Thread Wellington Chevreuil (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HBASE-21843:
-
Attachment: HBASE-21843.master.001.patch

> AM misses region assignment in catastrophic scenarios where RS assigned to 
> the region in Meta does not have a WAL dir.
> --
>
> Key: HBASE-21843
> URL: https://issues.apache.org/jira/browse/HBASE-21843
> Project: HBase
>  Issue Type: Task
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: HBASE-21843.master.001.patch
>
>
> A bit unusual, but managed to face this twice lately on both distributed and 
> local standalone mode, on VMs. Somehow, after some VM pause/resume, got into 
> a situation where regions on meta were assigned to a give RS startcode that 
> had no corresponding WAL dir.
> That caused those regions to never get assigned, because the given RS 
> startcode is not found anywhere by RegionServerTracker/ServerManager, so no 
> SCP is created to this RS startcode, leaving the region "open" on a dead 
> server forever, in META.
> Could get this sorted by adding extra check on loadMeta, checking if the RS 
> assigned to the region in meta is not online and doesn't have a WAL dir, then 
> mark this region as offline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21843) AM misses region assignment in catastrophic scenarios where RS assigned to the region in Meta does not have a WAL dir.

2019-02-04 Thread Wellington Chevreuil (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HBASE-21843:
-
Affects Version/s: 2.2.0
   3.0.0
   2.1.0

> AM misses region assignment in catastrophic scenarios where RS assigned to 
> the region in Meta does not have a WAL dir.
> --
>
> Key: HBASE-21843
> URL: https://issues.apache.org/jira/browse/HBASE-21843
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 3.0.0, 2.1.0, 2.2.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: HBASE-21843.master.001.patch
>
>
> A bit unusual, but managed to face this twice lately on both distributed and 
> local standalone mode, on VMs. Somehow, after some VM pause/resume, got into 
> a situation where regions on meta were assigned to a give RS startcode that 
> had no corresponding WAL dir.
> That caused those regions to never get assigned, because the given RS 
> startcode is not found anywhere by RegionServerTracker/ServerManager, so no 
> SCP is created to this RS startcode, leaving the region "open" on a dead 
> server forever, in META.
> Could get this sorted by adding extra check on loadMeta, checking if the RS 
> assigned to the region in meta is not online and doesn't have a WAL dir, then 
> mark this region as offline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21843) AM misses region assignment in catastrophic scenarios where RS assigned to the region in Meta does not have a WAL dir.

2019-02-04 Thread Wellington Chevreuil (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HBASE-21843:
-
Fix Version/s: 2.1.4
   2.2.0
   3.0.0
  Component/s: amv2
   Issue Type: Bug  (was: Task)

> AM misses region assignment in catastrophic scenarios where RS assigned to 
> the region in Meta does not have a WAL dir.
> --
>
> Key: HBASE-21843
> URL: https://issues.apache.org/jira/browse/HBASE-21843
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.1.4
>
> Attachments: HBASE-21843.master.001.patch
>
>
> A bit unusual, but managed to face this twice lately on both distributed and 
> local standalone mode, on VMs. Somehow, after some VM pause/resume, got into 
> a situation where regions on meta were assigned to a give RS startcode that 
> had no corresponding WAL dir.
> That caused those regions to never get assigned, because the given RS 
> startcode is not found anywhere by RegionServerTracker/ServerManager, so no 
> SCP is created to this RS startcode, leaving the region "open" on a dead 
> server forever, in META.
> Could get this sorted by adding extra check on loadMeta, checking if the RS 
> assigned to the region in meta is not online and doesn't have a WAL dir, then 
> mark this region as offline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21843) AM misses region assignment in catastrophic scenarios where RS assigned to the region in Meta does not have a WAL dir.

2019-02-04 Thread Wellington Chevreuil (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HBASE-21843:
-
Fix Version/s: (was: 2.1.4)
   (was: 2.2.0)
   (was: 3.0.0)

> AM misses region assignment in catastrophic scenarios where RS assigned to 
> the region in Meta does not have a WAL dir.
> --
>
> Key: HBASE-21843
> URL: https://issues.apache.org/jira/browse/HBASE-21843
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Attachments: HBASE-21843.master.001.patch
>
>
> A bit unusual, but managed to face this twice lately on both distributed and 
> local standalone mode, on VMs. Somehow, after some VM pause/resume, got into 
> a situation where regions on meta were assigned to a give RS startcode that 
> had no corresponding WAL dir.
> That caused those regions to never get assigned, because the given RS 
> startcode is not found anywhere by RegionServerTracker/ServerManager, so no 
> SCP is created to this RS startcode, leaving the region "open" on a dead 
> server forever, in META.
> Could get this sorted by adding extra check on loadMeta, checking if the RS 
> assigned to the region in meta is not online and doesn't have a WAL dir, then 
> mark this region as offline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21843) AM misses region assignment in catastrophic scenarios where RS assigned to the region in Meta does not have a WAL dir.

2019-02-04 Thread Wellington Chevreuil (JIRA)
Wellington Chevreuil created HBASE-21843:


 Summary: AM misses region assignment in catastrophic scenarios 
where RS assigned to the region in Meta does not have a WAL dir.
 Key: HBASE-21843
 URL: https://issues.apache.org/jira/browse/HBASE-21843
 Project: HBase
  Issue Type: Task
Reporter: Wellington Chevreuil
Assignee: Wellington Chevreuil


A bit unusual, but managed to face this twice lately on both distributed and 
local standalone mode, on VMs. Somehow, after some VM pause/resume, got into a 
situation where regions on meta were assigned to a give RS startcode that had 
no corresponding WAL dir.

That caused those regions to never get assigned, because the given RS startcode 
is not found anywhere by RegionServerTracker/ServerManager, so no SCP is 
created to this RS startcode, leaving the region "open" on a dead server 
forever, in META.

Could get this sorted by adding extra check on loadMeta, checking if the RS 
assigned to the region in meta is not online and doesn't have a WAL dir, then 
mark this region as offline. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21804) Remove 0.94 check from the Linkchecker job

2019-02-04 Thread Peter Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759918#comment-16759918
 ] 

Peter Somogyi commented on HBASE-21804:
---

[~jatsakthi], what do you think about completely removing 0.94 from 
hbase.apache.org?

I had a chat with [~busbey] and he had an idea to redirect users from our site 
to the [latest tarball|https://archive.apache.org/dist/hbase/hbase-0.94.27/] 
for 0.94 documentation. Since this tarball has the same content with 
hbase.apache.org/0.94 we could just send out an email to user@ to use its 
documentation.

> Remove 0.94 check from the Linkchecker job
> --
>
> Key: HBASE-21804
> URL: https://issues.apache.org/jira/browse/HBASE-21804
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Major
> Attachments: hbase-21804.master.001.patch
>
>
> This is a pretty old release. Even though we don't have the link to the doc 
> from our main page, the linkchecker job lands directly at 
> [https://hbase.apache.org/0.94/] which has around 90 odd missing file issues. 
> I haven't yet looked at the missing anchors stuff yet.
> We can set linkchecker to not check 0.94.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19616) Review of LogCleaner Class

2019-02-04 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HBASE-19616:

Status: Patch Available  (was: Open)

> Review of LogCleaner Class
> --
>
> Key: HBASE-19616
> URL: https://issues.apache.org/jira/browse/HBASE-19616
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HBASE-19616.1.patch, HBASE-19616.2.patch, 
> HBASE-19616.3.patch
>
>
> * Parameterize logging
> * Remove compiler-reported dead code to re-enabling useful logging
> * Use ArrayList instead of LinkedList



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19616) Review of LogCleaner Class

2019-02-04 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HBASE-19616:

Description: 
* Parameterize logging
* Remove compiler-reported dead code to re-enabling useful logging
* Use ArrayList instead of LinkedList

Used a CountDownLatch to replace a bunch of the existing code. It currently 
loops with a 500ms interval to check if some sort of condition has been met 
until the amount of time spent looping is greater than some timeout value. 
Using a CountDownLatch allows one or more threads to wait until a set of 
operations being performed in other threads completes. It will not blindly 
sleep between checks and it will return immediately after the condition is met. 
This removes the HBase configuration that controls the sleep interval.

 

I also cleaned up the unit tests a bit and enhanced the logging of this class 
to ease troubleshooting.

  was:
* Parameterize logging
* Remove compiler-reported dead code to re-enabling useful logging
* Use ArrayList instead of LinkedList


> Review of LogCleaner Class
> --
>
> Key: HBASE-19616
> URL: https://issues.apache.org/jira/browse/HBASE-19616
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HBASE-19616.1.patch, HBASE-19616.2.patch, 
> HBASE-19616.3.patch
>
>
> * Parameterize logging
> * Remove compiler-reported dead code to re-enabling useful logging
> * Use ArrayList instead of LinkedList
> Used a CountDownLatch to replace a bunch of the existing code. It currently 
> loops with a 500ms interval to check if some sort of condition has been met 
> until the amount of time spent looping is greater than some timeout value. 
> Using a CountDownLatch allows one or more threads to wait until a set of 
> operations being performed in other threads completes. It will not blindly 
> sleep between checks and it will return immediately after the condition is 
> met. This removes the HBase configuration that controls the sleep interval.
>  
> I also cleaned up the unit tests a bit and enhanced the logging of this class 
> to ease troubleshooting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21512) Introduce an AsyncClusterConnection and replace the usage of ClusterConnection

2019-02-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759882#comment-16759882
 ] 

Hudson commented on HBASE-21512:


Results for branch HBASE-21512
[build #87 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/87/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/87//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/87//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-21512/87//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Introduce an AsyncClusterConnection and replace the usage of ClusterConnection
> --
>
> Key: HBASE-21512
> URL: https://issues.apache.org/jira/browse/HBASE-21512
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Duo Zhang
>Priority: Major
> Fix For: 3.0.0
>
>
> At least for the RSProcedureDispatcher, with CompletableFuture we do not need 
> to set a delay and use a thread pool any more, which could reduce the 
> resource usage and also the latency.
> Once this is done, I think we can remove the ClusterConnection completely, 
> and start to rewrite the old sync client based on the async client, which 
> could reduce the code base a lot for our client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19616) Review of LogCleaner Class

2019-02-04 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HBASE-19616:

Status: Open  (was: Patch Available)

> Review of LogCleaner Class
> --
>
> Key: HBASE-19616
> URL: https://issues.apache.org/jira/browse/HBASE-19616
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HBASE-19616.1.patch, HBASE-19616.2.patch, 
> HBASE-19616.3.patch
>
>
> * Parameterize logging
> * Remove compiler-reported dead code to re-enabling useful logging
> * Use ArrayList instead of LinkedList



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19616) Review of LogCleaner Class

2019-02-04 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HBASE-19616:

Priority: Minor  (was: Trivial)

> Review of LogCleaner Class
> --
>
> Key: HBASE-19616
> URL: https://issues.apache.org/jira/browse/HBASE-19616
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HBASE-19616.1.patch, HBASE-19616.2.patch, 
> HBASE-19616.3.patch
>
>
> * Parameterize logging
> * Remove compiler-reported dead code to re-enabling useful logging
> * Use ArrayList instead of LinkedList



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19616) Review of LogCleaner Class

2019-02-04 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HBASE-19616:

Attachment: HBASE-19616.3.patch

> Review of LogCleaner Class
> --
>
> Key: HBASE-19616
> URL: https://issues.apache.org/jira/browse/HBASE-19616
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HBASE-19616.1.patch, HBASE-19616.2.patch, 
> HBASE-19616.3.patch
>
>
> * Parameterize logging
> * Remove compiler-reported dead code to re-enabling useful logging
> * Use ArrayList instead of LinkedList



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] meszibalu opened a new pull request #13: HBASE-21841 Allow inserting null values throw DataSource API

2019-02-04 Thread GitBox
meszibalu opened a new pull request #13: HBASE-21841 Allow inserting null 
values throw DataSource API
URL: https://github.com/apache/hbase-connectors/pull/13
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (HBASE-21839) Put up 2.1.3RC0

2019-02-04 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759867#comment-16759867
 ] 

Duo Zhang commented on HBASE-21839:
---

Made a new 2.1.3C0 tag at this commit
{noformat}
commit ab0c927080f5ad597d222d328023a3ec8f77d502 (HEAD -> branch-2.1, tag: 
2.1.3RC0, xiaomi/branch-2.1, origin/branch-2.1, asf/branch-2.1)
Author: zhangduo 
Date:   Mon Feb 4 21:10:13 2019 +0800

HBASE-21819 Addendum include new resolved issues
{noformat}

> Put up 2.1.3RC0
> ---
>
> Key: HBASE-21839
> URL: https://issues.apache.org/jira/browse/HBASE-21839
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21667) Move to latest ASF Parent POM

2019-02-04 Thread Peter Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759864#comment-16759864
 ] 

Peter Somogyi commented on HBASE-21667:
---

I found some issues while building the site. One is that some generated html 
files have a new name because of the maven-project-info-reports-plugin upgrade 
(e.g. license.html -> licenses.html). This is easy to fix.
The other one is related to the Fluido skin we use. The escape rule is still 
not correct so html files have incorrect css styles, tables do not have any 
formatting.

I'm holding back the commit and will come back to this issue later.

> Move to latest ASF Parent POM
> -
>
> Key: HBASE-21667
> URL: https://issues.apache.org/jira/browse/HBASE-21667
> Project: HBase
>  Issue Type: Improvement
>  Components: build
>Reporter: Peter Somogyi
>Assignee: Peter Somogyi
>Priority: Minor
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21667.branch-2.0.001.patch, 
> HBASE-21667.branch-2.0.002.patch, HBASE-21667.branch-2.001.patch, 
> HBASE-21667.master.001.patch, HBASE-21667.master.002.patch
>
>
> Currently HBase depends on version 18 which was released on 2016-05-18. 
> Version 21 was released in August 2018.
> Relevant dependency upgrades
>  
> ||Name||Currently used version||New version||Notes||
> |surefire.version|2.21.0|2.22.0| |
> |maven-compiler-plugin|3.6.1|3.7| |
> |maven-dependency-plugin|3.0.1|3.1.1| |
> |maven-jar-plugin|3.0.0|3.0.2| |
> |maven-javadoc-plugin|3.0.0|3.0.1| |
> |maven-resources-plugin|2.7|3.1.0| |
> |maven-site-plugin|3.4|3.7.1|Currently not relying on ASF version. See: 
> HBASE-18333|
> |maven-source-plugin|3.0.0|3.0.1| |
> |maven-shade-plugin|3.0.0|3.1.1|Newly added to ASF pom|
> |maven-clean-plugin|3.0.0|3.1.0| |
> |maven-project-info-reports-plugin |2.9|3.0.0| |
> Version 21 added net.nicoulaj.maven.plugins:checksum-maven-plugin which 
> introduced SHA512 checksum instead of SHA1. Should verify if we can rely on 
> that for releases or breaks our current processes.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21819) Generate CHANGES.md and RELEASENOTES.md for 2.1.3

2019-02-04 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21819:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed the addendum to branch-2.1.

> Generate CHANGES.md and RELEASENOTES.md for 2.1.3
> -
>
> Key: HBASE-21819
> URL: https://issues.apache.org/jira/browse/HBASE-21819
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, release
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.1.3
>
> Attachments: HBASE-21819-branch-2.1-addendum.patch, 
> HBASE-21819-branch-2.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21812) Address ruby static analysis for bin module [2nd pass]

2019-02-04 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-21812:

   Resolution: Fixed
Fix Version/s: 2.3.0
   Status: Resolved  (was: Patch Available)

thanks for the clean up [~jatsakthi]!

> Address ruby static analysis for bin module [2nd pass]
> --
>
> Key: HBASE-21812
> URL: https://issues.apache.org/jira/browse/HBASE-21812
> Project: HBase
>  Issue Type: Improvement
>  Components: scripts
>Reporter: Sakthi
>Assignee: Sakthi
>Priority: Minor
> Fix For: 3.0.0, 2.3.0
>
> Attachments: hbase-21812.master.001.patch, 
> hbase-21812.master.002.patch
>
>
> -HBASE-18237- did a pass in the shell and bin directories. I think we can go 
> for another round. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21819) Generate CHANGES.md and RELEASENOTES.md for 2.1.3

2019-02-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759845#comment-16759845
 ] 

Hadoop QA commented on HBASE-21819:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} branch-2.1 Compile Tests {color} ||
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}  0m 43s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:42ca976 |
| JIRA Issue | HBASE-21819 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12957507/HBASE-21819-branch-2.1-addendum.patch
 |
| Optional Tests |  dupname  asflicense  |
| uname | Linux 05c6fce2355a 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.1 / c0834e1823 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Max. process+thread count | 42 (vs. ulimit of 1) |
| modules | C: . U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15872/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Generate CHANGES.md and RELEASENOTES.md for 2.1.3
> -
>
> Key: HBASE-21819
> URL: https://issues.apache.org/jira/browse/HBASE-21819
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, release
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.1.3
>
> Attachments: HBASE-21819-branch-2.1-addendum.patch, 
> HBASE-21819-branch-2.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21819) Generate CHANGES.md and RELEASENOTES.md for 2.1.3

2019-02-04 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21819:
--
Attachment: HBASE-21819-branch-2.1-addendum.patch

> Generate CHANGES.md and RELEASENOTES.md for 2.1.3
> -
>
> Key: HBASE-21819
> URL: https://issues.apache.org/jira/browse/HBASE-21819
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, release
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.1.3
>
> Attachments: HBASE-21819-branch-2.1-addendum.patch, 
> HBASE-21819-branch-2.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21819) Generate CHANGES.md and RELEASENOTES.md for 2.1.3

2019-02-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759842#comment-16759842
 ] 

Hadoop QA commented on HBASE-21819:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HBASE-21819 does not apply to master. Rebase required? Wrong 
Branch? See https://yetus.apache.org/documentation/0.8.0/precommit-patchnames 
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HBASE-21819 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12957505/HBASE-21819-addendum-branch-2.1.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/15871/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Generate CHANGES.md and RELEASENOTES.md for 2.1.3
> -
>
> Key: HBASE-21819
> URL: https://issues.apache.org/jira/browse/HBASE-21819
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, release
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.1.3
>
> Attachments: HBASE-21819-addendum-branch-2.1.patch, 
> HBASE-21819-branch-2.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21819) Generate CHANGES.md and RELEASENOTES.md for 2.1.3

2019-02-04 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21819:
--
Attachment: (was: HBASE-21819-addendum-branch-2.1.patch)

> Generate CHANGES.md and RELEASENOTES.md for 2.1.3
> -
>
> Key: HBASE-21819
> URL: https://issues.apache.org/jira/browse/HBASE-21819
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, release
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.1.3
>
> Attachments: HBASE-21819-branch-2.1-addendum.patch, 
> HBASE-21819-branch-2.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-21819) Generate CHANGES.md and RELEASENOTES.md for 2.1.3

2019-02-04 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang reopened HBASE-21819:
---

> Generate CHANGES.md and RELEASENOTES.md for 2.1.3
> -
>
> Key: HBASE-21819
> URL: https://issues.apache.org/jira/browse/HBASE-21819
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, release
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.1.3
>
> Attachments: HBASE-21819-addendum-branch-2.1.patch, 
> HBASE-21819-branch-2.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21819) Generate CHANGES.md and RELEASENOTES.md for 2.1.3

2019-02-04 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21819:
--
Status: Patch Available  (was: Reopened)

> Generate CHANGES.md and RELEASENOTES.md for 2.1.3
> -
>
> Key: HBASE-21819
> URL: https://issues.apache.org/jira/browse/HBASE-21819
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, release
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.1.3
>
> Attachments: HBASE-21819-addendum-branch-2.1.patch, 
> HBASE-21819-branch-2.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21819) Generate CHANGES.md and RELEASENOTES.md for 2.1.3

2019-02-04 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21819:
--
Attachment: HBASE-21819-addendum-branch-2.1.patch

> Generate CHANGES.md and RELEASENOTES.md for 2.1.3
> -
>
> Key: HBASE-21819
> URL: https://issues.apache.org/jira/browse/HBASE-21819
> Project: HBase
>  Issue Type: Sub-task
>  Components: documentation, release
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.1.3
>
> Attachments: HBASE-21819-addendum-branch-2.1.patch, 
> HBASE-21819-branch-2.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20662) Increasing space quota on a violated table does not remove SpaceViolationPolicy.DISABLE enforcement

2019-02-04 Thread Nihal Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759840#comment-16759840
 ] 

Nihal Jain commented on HBASE-20662:


Thanks for the review [~jatsakthi]

bq. The extractQuotaSnapshot() does the null check and throws 
IllegalArgumentException. Would we want to use that exception?
Please see 
https://issues.apache.org/jira/browse/HBASE-20662?focusedCommentId=16738656=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16738656


bq. Also, we can do the "isEmpty()" check in extractQuotaSnapshot() as well 
along with null check.
I have added a row.length == 0 check.

I have addressed your reviewboard comments in  [^HBASE-20662.master.010.patch] 
. This patch also fixes few typos in test space quotas class.

Ping [~elserj], [~jatsakthi]

> Increasing space quota on a violated table does not remove 
> SpaceViolationPolicy.DISABLE enforcement
> ---
>
> Key: HBASE-20662
> URL: https://issues.apache.org/jira/browse/HBASE-20662
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.0.0
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20662.master.001.patch, 
> HBASE-20662.master.002.patch, HBASE-20662.master.003.patch, 
> HBASE-20662.master.004.patch, HBASE-20662.master.004.patch, 
> HBASE-20662.master.005.patch, HBASE-20662.master.006.patch, 
> HBASE-20662.master.007.patch, HBASE-20662.master.008.patch, 
> HBASE-20662.master.008.patch, HBASE-20662.master.009.patch, 
> HBASE-20662.master.009.patch, HBASE-20662.master.010.patch, screenshot.png
>
>
> *Steps to reproduce*
>  * Create a table and set quota with {{SpaceViolationPolicy.DISABLE}} having 
> limit say 2MB
>  * Now put rows until space quota is violated and table gets disabled
>  * Next, increase space quota with limit say 4MB on the table
>  * Now try putting a row into the table
> {code:java}
>  private void testSetQuotaThenViolateAndFinallyIncreaseQuota() throws 
> Exception {
> SpaceViolationPolicy policy = SpaceViolationPolicy.DISABLE;
> Put put = new Put(Bytes.toBytes("to_reject"));
> put.addColumn(Bytes.toBytes(SpaceQuotaHelperForTests.F1), 
> Bytes.toBytes("to"),
>   Bytes.toBytes("reject"));
> // Do puts until we violate space policy
> final TableName tn = writeUntilViolationAndVerifyViolation(policy, put);
> // Now, increase limit
> setQuotaLimit(tn, policy, 4L);
> // Put some row now: should not violate as quota limit increased
> verifyNoViolation(policy, tn, put);
>   }
> {code}
> *Expected*
> We should be able to put data as long as newly set quota limit is not reached
> *Actual*
> We fail to put any new row even after increasing limit
> *Root cause*
> Increasing quota on a violated table triggers the table to be enabled, but 
> since the table is already in violation, the system does not allow it to be 
> enabled (may be thinking that a user is trying to enable it)
> *Relevant exception trace*
> {noformat}
> 2018-05-31 00:34:27,563 INFO  [regionserver/root1-ThinkPad-T440p:0.Chore.1] 
> client.HBaseAdmin$14(844): Started enable of 
> testSetQuotaAndThenIncreaseQuotaWithDisable0
> 2018-05-31 00:34:27,571 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=42525] 
> ipc.CallRunner(142): callId: 11 service: MasterService methodName: 
> EnableTable size: 104 connection: 127.0.0.1:38030 deadline: 1527707127568, 
> exception=org.apache.hadoop.hbase.security.AccessDeniedException: Enabling 
> the table 'testSetQuotaAndThenIncreaseQuotaWithDisable0' is disallowed due to 
> a violated space quota.
> 2018-05-31 00:34:27,571 ERROR [regionserver/root1-ThinkPad-T440p:0.Chore.1] 
> quotas.RegionServerSpaceQuotaManager(210): Failed to disable space violation 
> policy for testSetQuotaAndThenIncreaseQuotaWithDisable0. This table will 
> remain in violation.
> org.apache.hadoop.hbase.security.AccessDeniedException: 
> org.apache.hadoop.hbase.security.AccessDeniedException: Enabling the table 
> 'testSetQuotaAndThenIncreaseQuotaWithDisable0' is disallowed due to a 
> violated space quota.
>   at org.apache.hadoop.hbase.master.HMaster$6.run(HMaster.java:2275)
>   at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131)
>   at org.apache.hadoop.hbase.master.HMaster.enableTable(HMaster.java:2258)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.enableTable(MasterRpcServices.java:725)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at 

[jira] [Updated] (HBASE-21795) Client application may get stuck (time bound) if a table modify op is called immediately after split op

2019-02-04 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21795:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to branch-2.0+.

Thanks [~nihaljain.cs] for contributing.

> Client application may get stuck (time bound) if a table modify op is called 
> immediately after split op
> ---
>
> Key: HBASE-21795
> URL: https://issues.apache.org/jira/browse/HBASE-21795
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5, 2.3.0
>
> Attachments: HBASE-21795.master.001.patch
>
>
> *Steps:*
>  * Create a table
>  * Split the table
>  * Modify the table immediately after splitting
> *Expected*: 
> The modify table procedure completes and control returns back to client
> *Actual:* 
> The modify table procedure completes and control does not return back to 
> client, until catalog janitor runs and deletes parent or future timeout occurs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20662) Increasing space quota on a violated table does not remove SpaceViolationPolicy.DISABLE enforcement

2019-02-04 Thread Nihal Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal Jain updated HBASE-20662:
---
Attachment: HBASE-20662.master.010.patch

> Increasing space quota on a violated table does not remove 
> SpaceViolationPolicy.DISABLE enforcement
> ---
>
> Key: HBASE-20662
> URL: https://issues.apache.org/jira/browse/HBASE-20662
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.0.0
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20662.master.001.patch, 
> HBASE-20662.master.002.patch, HBASE-20662.master.003.patch, 
> HBASE-20662.master.004.patch, HBASE-20662.master.004.patch, 
> HBASE-20662.master.005.patch, HBASE-20662.master.006.patch, 
> HBASE-20662.master.007.patch, HBASE-20662.master.008.patch, 
> HBASE-20662.master.008.patch, HBASE-20662.master.009.patch, 
> HBASE-20662.master.009.patch, HBASE-20662.master.010.patch, screenshot.png
>
>
> *Steps to reproduce*
>  * Create a table and set quota with {{SpaceViolationPolicy.DISABLE}} having 
> limit say 2MB
>  * Now put rows until space quota is violated and table gets disabled
>  * Next, increase space quota with limit say 4MB on the table
>  * Now try putting a row into the table
> {code:java}
>  private void testSetQuotaThenViolateAndFinallyIncreaseQuota() throws 
> Exception {
> SpaceViolationPolicy policy = SpaceViolationPolicy.DISABLE;
> Put put = new Put(Bytes.toBytes("to_reject"));
> put.addColumn(Bytes.toBytes(SpaceQuotaHelperForTests.F1), 
> Bytes.toBytes("to"),
>   Bytes.toBytes("reject"));
> // Do puts until we violate space policy
> final TableName tn = writeUntilViolationAndVerifyViolation(policy, put);
> // Now, increase limit
> setQuotaLimit(tn, policy, 4L);
> // Put some row now: should not violate as quota limit increased
> verifyNoViolation(policy, tn, put);
>   }
> {code}
> *Expected*
> We should be able to put data as long as newly set quota limit is not reached
> *Actual*
> We fail to put any new row even after increasing limit
> *Root cause*
> Increasing quota on a violated table triggers the table to be enabled, but 
> since the table is already in violation, the system does not allow it to be 
> enabled (may be thinking that a user is trying to enable it)
> *Relevant exception trace*
> {noformat}
> 2018-05-31 00:34:27,563 INFO  [regionserver/root1-ThinkPad-T440p:0.Chore.1] 
> client.HBaseAdmin$14(844): Started enable of 
> testSetQuotaAndThenIncreaseQuotaWithDisable0
> 2018-05-31 00:34:27,571 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=42525] 
> ipc.CallRunner(142): callId: 11 service: MasterService methodName: 
> EnableTable size: 104 connection: 127.0.0.1:38030 deadline: 1527707127568, 
> exception=org.apache.hadoop.hbase.security.AccessDeniedException: Enabling 
> the table 'testSetQuotaAndThenIncreaseQuotaWithDisable0' is disallowed due to 
> a violated space quota.
> 2018-05-31 00:34:27,571 ERROR [regionserver/root1-ThinkPad-T440p:0.Chore.1] 
> quotas.RegionServerSpaceQuotaManager(210): Failed to disable space violation 
> policy for testSetQuotaAndThenIncreaseQuotaWithDisable0. This table will 
> remain in violation.
> org.apache.hadoop.hbase.security.AccessDeniedException: 
> org.apache.hadoop.hbase.security.AccessDeniedException: Enabling the table 
> 'testSetQuotaAndThenIncreaseQuotaWithDisable0' is disallowed due to a 
> violated space quota.
>   at org.apache.hadoop.hbase.master.HMaster$6.run(HMaster.java:2275)
>   at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131)
>   at org.apache.hadoop.hbase.master.HMaster.enableTable(HMaster.java:2258)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.enableTable(MasterRpcServices.java:725)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> 

[jira] [Updated] (HBASE-21840) TestHRegionWithInMemoryFlush fails with NPE

2019-02-04 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21840:
--
Component/s: (was: regionserver)
 test

> TestHRegionWithInMemoryFlush fails with NPE
> ---
>
> Key: HBASE-21840
> URL: https://issues.apache.org/jira/browse/HBASE-21840
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5, 2.3.0
>
> Attachments: HBASE-21840.patch
>
>
> Found this one when testing 2.1.3.
> {noformat}
> Exception in thread "PutThread" java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore.checkActiveSize(CompactingMemStore.java:392)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.internalAdd(AbstractMemStore.java:307)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:132)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:112)
> at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:750)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.applyToMemStore(HRegion.java:4420)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.access$500(HRegion.java:226)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.applyFamilyMapToMemStore(HRegion.java:3479)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lambda$writeMiniBatchOperationsToMemStore$0(HRegion.java:3170)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.visitBatchOperations(HRegion.java:3103)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3162)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3644)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4058)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3991)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3922)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3913)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3927)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4254)
> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3046)
> {noformat}
> And later the test is stuck, since the MVCC can not be advanced any more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] meszibalu opened a new pull request #12: HBASE-21842 Remove ${revision} from parent POMs

2019-02-04 Thread GitBox
meszibalu opened a new pull request #12: HBASE-21842 Remove ${revision} from 
parent POMs
URL: https://github.com/apache/hbase-connectors/pull/12
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (HBASE-21840) TestHRegionWithInMemoryFlush fails with NPE

2019-02-04 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21840:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to branch-2.0+.

> TestHRegionWithInMemoryFlush fails with NPE
> ---
>
> Key: HBASE-21840
> URL: https://issues.apache.org/jira/browse/HBASE-21840
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5, 2.3.0
>
> Attachments: HBASE-21840.patch
>
>
> Found this one when testing 2.1.3.
> {noformat}
> Exception in thread "PutThread" java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.CompactingMemStore.checkActiveSize(CompactingMemStore.java:392)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.internalAdd(AbstractMemStore.java:307)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:132)
> at 
> org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:112)
> at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:750)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.applyToMemStore(HRegion.java:4420)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.access$500(HRegion.java:226)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.applyFamilyMapToMemStore(HRegion.java:3479)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.lambda$writeMiniBatchOperationsToMemStore$0(HRegion.java:3170)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.visitBatchOperations(HRegion.java:3103)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$BatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3162)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$MutationBatchOperation.writeMiniBatchOperationsToMemStore(HRegion.java:3644)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4058)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3991)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3922)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3913)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3927)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.doBatchMutate(HRegion.java:4254)
> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3046)
> {noformat}
> And later the test is stuck, since the MVCC can not be advanced any more.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21795) Client application may get stuck (time bound) if a table modify op is called immediately after split op

2019-02-04 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21795:
--
Component/s: amv2

> Client application may get stuck (time bound) if a table modify op is called 
> immediately after split op
> ---
>
> Key: HBASE-21795
> URL: https://issues.apache.org/jira/browse/HBASE-21795
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.3, 2.0.5, 2.3.0
>
> Attachments: HBASE-21795.master.001.patch
>
>
> *Steps:*
>  * Create a table
>  * Split the table
>  * Modify the table immediately after splitting
> *Expected*: 
> The modify table procedure completes and control returns back to client
> *Actual:* 
> The modify table procedure completes and control does not return back to 
> client, until catalog janitor runs and deletes parent or future timeout occurs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21795) Client application may get stuck (time bound) if a table modify op is called immediately after split op

2019-02-04 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759789#comment-16759789
 ] 

Duo Zhang commented on HBASE-21795:
---

Oh shit we do have a getAlterStatus check after we finish waiting the procedure.

+1.

Let me include this in 2.1.3.

> Client application may get stuck (time bound) if a table modify op is called 
> immediately after split op
> ---
>
> Key: HBASE-21795
> URL: https://issues.apache.org/jira/browse/HBASE-21795
> Project: HBase
>  Issue Type: Bug
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Critical
> Attachments: HBASE-21795.master.001.patch
>
>
> *Steps:*
>  * Create a table
>  * Split the table
>  * Modify the table immediately after splitting
> *Expected*: 
> The modify table procedure completes and control returns back to client
> *Actual:* 
> The modify table procedure completes and control does not return back to 
> client, until catalog janitor runs and deletes parent or future timeout occurs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >