[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-10-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635054#comment-16635054
 ] 

Hudson commented on HBASE-20952:


Results for branch HBASE-20952
[build #5 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/5/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/5//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/5//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-20952/5//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21258) Add resetting of flags for RS Group pre/post hooks in TestRSGroups

2018-10-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635052#comment-16635052
 ] 

Hudson commented on HBASE-21258:


Results for branch branch-2
[build #1330 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1330/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1330//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1330//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1330//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Add resetting of flags for RS Group pre/post hooks in TestRSGroups
> --
>
> Key: HBASE-21258
> URL: https://issues.apache.org/jira/browse/HBASE-21258
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8
>
> Attachments: 21258.branch-1.04.txt, 21258.branch-1.05.txt, 
> 21258.branch-2.v1.patch, 21258.v1.txt
>
>
> Over HBASE-20627, [~xucang] reminded me that the resetting of flags for RS 
> Group pre/post hooks in TestRSGroups was absent.
> This issue is to add the resetting of these flags before each subtest starts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21192) Add HOW-TO repair damaged AMv2.

2018-10-01 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635045#comment-16635045
 ] 

stack commented on HBASE-21192:
---

h2. Master startup cannot progress, in holding-pattern until region onlined

If the cluster comes up and reports in the logs lines like the below:

{code}
2018-10-01 22:07:42,792 WARN org.apache.hadoop.hbase.master.HMaster: 
hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=CLOSING, 
ts=1538456302300, server=ve1017.halxg.cloudera.com,22101,1538449648131}; 
ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern 
until region onlined.
{code}

... there is no procedure to assign meta.

To inject one, use the hbck2 tool:

{code}
 HBASE_CLASSPATH_PREFIX=./hbase-hbck2-1.0.0-SNAPSHOT.jar hbase 
org.apache.hbase.HBCK2 unassigns 1588230740
{code}

(1588230740 is the hard-coded encoded region name for hbase:meta -- the hbck2 
takes encoded region names).

You'll probably have to assign the hbase:namespace too if you had to assign 
meta. Look out for the encoded name of the namespace region... it'll be a line 
like this:

{code}
2018-10-01 22:09:49,681 WARN org.apache.hadoop.hbase.master.HMaster: 
hbase:namespace,,1526694055629.37cc206fe9c4bc1c0a46a34c5f523d16. is NOT online; 
state={37cc206fe9c4bc1c0a46a34c5f523d16 state=OPEN, ts=1538456987236, 
server=ve1233.halxg.cloudera.com,22101,1538441741767}; 
ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern 
until region onlined.
{code}

37cc206fe9c4bc1c0a46a34c5f523d16 is the encoded name of the namespace table 
region... 

(This stuff will be cleaned up more... just dropping note here for moment so 
don't forget when doing writeup...)


> Add HOW-TO repair damaged AMv2.
> ---
>
> Key: HBASE-21192
> URL: https://issues.apache.org/jira/browse/HBASE-21192
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
>
> Need a page or two on how to do various fixups. Will include doc on how to 
> identify particular circumstance, how to run a repair, as well as caveats 
> (e.g. if no log recovery, then region may be missing edits).
> Add pointer to log messages, especially those that explicitly ask for 
> operator intervention; e.g. Master#inMeta.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21191) Add a holding-pattern if no assign for meta or namespace (Can happen if masterprocwals have been cleared).

2018-10-01 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635043#comment-16635043
 ] 

stack commented on HBASE-21191:
---

Just to note that I just had a situation where meta was not assigned and we 
went into the "holding pattern" added here. I was able to do an assign of the 
meta and namespace using hbck2 and this caused us to move out of the "holding 
pattern" and continue startup.

> Add a holding-pattern if no assign for meta or namespace (Can happen if 
> masterprocwals have been cleared).
> --
>
> Key: HBASE-21191
> URL: https://issues.apache.org/jira/browse/HBASE-21191
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: HBASE-21191.branch-2.1.001.patch, 
> HBASE-21191.branch-2.1.002.patch, HBASE-21191.branch-2.1.003.patch, 
> HBASE-21191.branch-2.1.004.patch, HBASE-21191.branch-2.1.005.patch, 
> HBASE-21191.branch-2.1.006.patch, HBASE-21191.branch-2.1.007.patch
>
>
> If the masterprocwals have been removed -- operator error, hdfs dataloss, or 
> because we have gotten ourselves into a pathological state where we have 
> hundreds of masterprocwals too process and it is taking too long so we just 
> want to startover -- then master startup will have a dilemma. Master startup 
> needs hbase:meta to be online. If the masterprocwals have been removed, there 
> may be no outstanding assign or a servercrashprocedure with coverage for 
> hbase:meta (I ran into this issue repeatedly in internal testing purging 
> masterprocwals on a large test cluster). Worse, when master startup cannot 
> find an online hbase:meta, it exits after exhausting the RPC retries.
> So, we need a holding-pattern for master startup if hbase:meta is not online 
> if only so an operator can schedule an assign for meta or so they can assign 
> fixup procedures (HBASE-20786 has discussion on why we cannot just 
> auto-schedule an assign of meta).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21259) [amv2] Revived deadservers; recreated serverstatenode

2018-10-01 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635021#comment-16635021
 ] 

stack commented on HBASE-21259:
---

Let me backup [~allan163]. I was playing with a patch and it seems to bring on 
new sets of issues. Let me back it out and start over. I'll paste you a log 
then.

bq. It seems like a bug, but, adding a crashed server into ServerStateNode will 
casue any trouble?

I was freaked out by the number of SCPs was about one per region of which 
there were hundreds and then hundreds of different servers.

I'll be back.



> [amv2] Revived deadservers; recreated serverstatenode
> -
>
> Key: HBASE-21259
> URL: https://issues.apache.org/jira/browse/HBASE-21259
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.1.0
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.2.0, 2.1.1, 2.0.3
>
>
> On startup, I see servers being revived; i.e. their serverstatenode is 
> getting marked online even though its just been processed by 
> ServerCrashProcedure. It looks like this (in a patched server that reports on 
> whenever a serverstatenode is created):
> {code}
> 2018-09-29 03:45:40,963 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=3982597, 
> state=SUCCESS; ServerCrashProcedure 
> server=vb1442.halxg.cloudera.com,22101,1536675314426, splitWal=true, 
> meta=false in 1.0130sec
> ...
> 2018-09-29 03:45:43,733 INFO 
> org.apache.hadoop.hbase.master.assignment.RegionStates: CREATING! 
> vb1442.halxg.cloudera.com,22101,1536675314426
> java.lang.RuntimeException: WHERE AM I?
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStates.getOrCreateServer(RegionStates.java:1116)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStates.addRegionToServer(RegionStates.java:1143)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1464)
> at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:200)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:369)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:953)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1716)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1494)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2022)
> {code}
> See how we've just finished a SCP which will have removed the 
> serverstatenode... but then we come across an unassign that references the 
> server that was just processed. The unassign will attempt to update the 
> serverstatenode and therein we create one if one not present. We shouldn't be 
> creating one.
> I think I see this a lot because I am scheduling unassigns with hbck2. The 
> servers crash and then come up with SCPs doing cleanup of old server and 
> unassign procedures in the procedure executor queue to be processed still 
>  but could happen at any time on cluster should an unassign happen get 
> scheduled near an SCP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations

2018-10-01 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634862#comment-16634862
 ] 

Hadoop QA commented on HBASE-21221:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
| {color:blue}0{color} | {color:blue} patch {color} | {color:blue}  0m  
2s{color} | {color:blue} The patch file was not named according to hbase's 
naming conventions. Please see 
https://yetus.apache.org/documentation/0.8.0/precommit-patchnames for 
instructions. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
50s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
12s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
28s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
12s{color} | {color:red} hbase-server: The patch generated 1 new + 20 unchanged 
- 0 fixed = 21 total (was 20) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
19s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 45s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}136m 
58s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}179m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21221 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12942050/21221.addendum.txt |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 81b4162ba775 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 4d7235ec54 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 

[jira] [Updated] (HBASE-18549) Unclaimed replication queues can go undetected

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18549:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: (was: 1.3.3)
   2.1.1
   2.2.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Committed. Thanks for taking this one on [~xucang]

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Xu Cang
>Priority: Critical
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1
>
> Attachments: HBASE-18549-.master.001.patch, 
> HBASE-18549-.master.002.patch, HBASE-18549-.master.003.patch, 
> HBASE-18549-.master.004.patch, HBASE-18549.branch-1.001.patch, 
> HBASE-18549.branch-1.001.patch
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-17890) FuzzyRowFilter fail if unaligned support is false

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-17890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-17890:
---
Fix Version/s: (was: 1.4.8)
   (was: 1.2.8)
   (was: 1.3.3)
   1.5.0

> FuzzyRowFilter fail if unaligned support is false
> -
>
> Key: HBASE-17890
> URL: https://issues.apache.org/jira/browse/HBASE-17890
> Project: HBase
>  Issue Type: Sub-task
>  Components: util
>Affects Versions: 1.2.5, 2.0.0
>Reporter: Jerry He
>Assignee: Chia-Ping Tsai
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-17890.v0.branch-1.patch, HBASE-17890.v0.patch, 
> HBASE-17890.v1.branch-1.patch, HBASE-17890.v1.patch, HBASE-17890.v2.patch, 
> HBASE-17890.v3.patch, HBASE-17890.v3.patch, HBASE-17890.v3.patch, 
> HBASE-17890.v3.patch, HBASE-17890.v3.patch
>
>
> When unaligned support is false, FuzzyRow tests fail:
> {noformat}
> Failed tests:
>   TestFuzzyRowAndColumnRangeFilter.Test:134->runTest:157->runScanner:186 
> expected:<10> but was:<0>
>   TestFuzzyRowFilter.testSatisfiesForward:81 expected: but was:
>   TestFuzzyRowFilter.testSatisfiesReverse:121 expected: but 
> was:
>   TestFuzzyRowFilterEndToEnd.testEndToEnd:247->runTest1:278->runScanner:343 
> expected:<6250> but was:<0>
>   TestFuzzyRowFilterEndToEnd.testFilterList:385->runTest:417->runScanner:445 
> expected:<5> but was:<0>
>   TestFuzzyRowFilterEndToEnd.testHBASE14782:204 expected:<6> but was:<0>
> {noformat}
> This can be reproduced in the case described in HBASE-17869. Or on a platform 
> really without unaligned support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-14223) Meta WALs are not cleared if meta region was closed and RS aborts

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-14223:
---
Fix Version/s: (was: 1.4.8)
   (was: 1.2.8)
   (was: 1.3.3)

> Meta WALs are not cleared if meta region was closed and RS aborts
> -
>
> Key: HBASE-14223
> URL: https://issues.apache.org/jira/browse/HBASE-14223
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-14223logs, hbase-14223_v0.patch, 
> hbase-14223_v1-branch-1.patch, hbase-14223_v2-branch-1.patch, 
> hbase-14223_v3-branch-1.patch, hbase-14223_v3-branch-1.patch, 
> hbase-14223_v3-master.patch
>
>
> When an RS opens meta, and later closes it, the WAL(FSHlog) is not closed. 
> The last WAL file just sits there in the RS WAL directory. If RS stops 
> gracefully, the WAL file for meta is deleted. Otherwise if RS aborts, WAL for 
> meta is not cleaned. It is also not split (which is correct) since master 
> determines that the RS no longer hosts meta at the time of RS abort. 
> From a cluster after running ITBLL with CM, I see a lot of {{-splitting}} 
> directories left uncleaned: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls 
> /apps/hbase/data/WALs
> Found 31 items
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 01:14 
> /apps/hbase/data/WALs/hregion-58203265
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 07:54 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433489308745-splitting
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 09:28 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433494382959-splitting
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 10:01 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433498252205-splitting
> ...
> {code}
> The directories contain WALs from meta: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting
> Found 2 items
> -rw-r--r--   3 hbase hadoop 201608 2015-06-05 03:15 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
> -rw-r--r--   3 hbase hadoop  44420 2015-06-05 04:36 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> The RS hosted the meta region for some time: 
> {code}
> 2015-06-05 03:14:28,692 INFO  [PostOpenDeployTasks:1588230740] 
> zookeeper.MetaTableLocator: Setting hbase:meta region location in ZooKeeper 
> as os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285
> ...
> 2015-06-05 03:15:17,302 INFO  
> [RS_CLOSE_META-os-enis-dal-test-jun-4-5:16020-0] regionserver.HRegion: Closed 
> hbase:meta,,1.1588230740
> {code}
> In between, a WAL is created: 
> {code}
> 2015-06-05 03:15:11,707 INFO  
> [RS_OPEN_META-os-enis-dal-test-jun-4-5:16020-0-MetaLogRoller] wal.FSHLog: 
> Rolled WAL 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
>  with entries=385, filesize=196.88 KB; new WAL 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> When CM killed the region server later master did not see these WAL files: 
> {code}
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:46,075 
> INFO  [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0] 
> master.SplitLogManager: started splitting 2 logs in 
> [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting]
>  for [os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285]
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:47,300 
> INFO  [main-EventThread] wal.WALSplitter: Archived processed log 
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436
>  to 
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/oldWALs/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436
> 

[jira] [Updated] (HBASE-18415) The local timeout may cause Admin to submit duplicate request

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-18415:
---
Fix Version/s: (was: 1.4.8)
   (was: 1.3.3)

> The local timeout may cause Admin to submit duplicate request
> -
>
> Key: HBASE-18415
> URL: https://issues.apache.org/jira/browse/HBASE-18415
> Project: HBase
>  Issue Type: Bug
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-18415.branch-1.ut.patch, 
> HBASE-18415.branch-1.v0.patch, HBASE-18415.branch-1.v1.patch, 
> HBASE-18415.branch-1.v2.patch, HBASE-18415.branch-1.v3.patch, 
> HBASE-18415.branch-1.v3.patch, HBASE-18415.branch-1.v3.patch, 
> HBASE-18415.branch-1.v4.patch, HBASE-18415.branch-1.v4.patch, 
> HBASE-18415.branch-1.v4.patch
>
>
> After a timeout occurs on first request, client will retry the request with 
> distinct group/nonce. The second request may bring the TableXXXException back 
> if the first request have changed the table state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18549) Unclaimed replication queues can go undetected

2018-10-01 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634834#comment-16634834
 ] 

Andrew Purtell commented on HBASE-18549:


Dropped this. Bringing it back. Let me see if more needs to be done before 
commit. Otherwise, will commit

> Unclaimed replication queues can go undetected
> --
>
> Key: HBASE-18549
> URL: https://issues.apache.org/jira/browse/HBASE-18549
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Ashu Pachauri
>Assignee: Xu Cang
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.4.8
>
> Attachments: HBASE-18549-.master.001.patch, 
> HBASE-18549-.master.002.patch, HBASE-18549-.master.003.patch, 
> HBASE-18549-.master.004.patch, HBASE-18549.branch-1.001.patch, 
> HBASE-18549.branch-1.001.patch
>
>
> We have come across this situation multiple times where a zookeeper issues 
> can cause NodeFailoverWorker to fail picking up replication queue for a dead 
> region server silently. One example is when the znode size for a particular 
> queue exceed jute.maxBuffer value.
> There can be other situations that may lead to this and just go undetected. 
> We need to have a metric for number of unclaimed replication queues. This 
> will help in mitigating the problem through alerting on the metric and 
> identifying underlying issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19444) RSGroups test units cannot be concurrently executed

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-19444:
---
Fix Version/s: (was: 1.4.8)
   1.4.9

> RSGroups test units cannot be concurrently executed
> ---
>
> Key: HBASE-19444
> URL: https://issues.apache.org/jira/browse/HBASE-19444
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.0
>Reporter: Andrew Purtell
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.9
>
>
> TestRSGroups and friends cannot be concurrently executed or they are very 
> likely to flake, failing with constraint exceptions. If executed serially all 
> units pass. Fix for concurrent execution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20643) Getting HDFSBlockDist in Master by querying RegionServers

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20643:
---
Fix Version/s: (was: 1.4.8)

> Getting HDFSBlockDist in Master by querying RegionServers
> -
>
> Key: HBASE-20643
> URL: https://issues.apache.org/jira/browse/HBASE-20643
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Thiruvel Thirumoolan
>Assignee: Thiruvel Thirumoolan
>Priority: Major
> Fix For: 1.5.0, 2.2.0
>
>
> Region locality information is needed by the balancer to generate region 
> plans. Computing HDFSBlockDistribution is expensive on larger clusters and 
> adds load to the NameNode. This also needs to be recomputed on a master 
> restart. The proposal is to get the HDFSBlockDistribution from the 
> RegionServers instead of computing it in Master. RS already has this 
> information and we could just reuse it by querying it. RS already passes 
> dataLocality info via RegionLoad today.
> Proposed Implementation: This is a high-level overview.
> # A RegionServer API has to be added which will return HDFSBlockDistribution 
> for all the regions it hosts. RS already has this info. Since ClusterStatus 
> has already become bulky and we don’t need updated locality so fast, it’s 
> better to have another API rather than add this to RegionLoad and pass it 
> along with RSReport.
> # Master will have a Chore to query all RegionServers and will cache the 
> HDFSBlockDistribution for those regions. This is easy and quick. Admins can 
> tune the frequency based on size of the cluster. On a ~90 nodes cluster with 
> 500k regions and a prototype implementation and no load, it took about 5 
> seconds to get all HDFSBlockDistribution from RS.
> # The cache will be an extension of RegionLocationFinder (subclass), if 
> needed to keep the implementation simple. Probably will get clear with 
> implementation.
> # Balancer will use the new cache to get all HDFSBlockDistribution. If there 
> is a new region and Chore didn’t get the block distribution from RS during 
> its previous run, then it will be computed by RegionLocationFinder the same 
> way it has been done now. If the Chore runs more frequently like every hour, 
> then this recomputation will be drastically reduced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20026) Add 1.4 release line to the JDK and Hadoop expectation tables

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20026:
---
Fix Version/s: (was: 1.4.8)
   1.4.9

> Add 1.4 release line to the JDK and Hadoop expectation tables
> -
>
> Key: HBASE-20026
> URL: https://issues.apache.org/jira/browse/HBASE-20026
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 1.4.0
>Reporter: Sean Busbey
>Priority: Critical
> Fix For: 1.4.9
>
>
> the ref guide currently doesn't have any expectations listed for branch-1.4 
> releases around JDK and Hadoop versions.
> either add it, or maybe update the existing entries so we have "1.2, 1.3, 
> 1.4" in a single entry. unless we're ready to include something different 
> among them. (Maybe note the default Hadoop we ship with? Or Hadoop 2.8.2+ 
> moving to S maybe? if we've actually done any of the legwork.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20694) Consolidate warning on SecureBulkLoad directory permissions

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20694:
---
Fix Version/s: (was: 1.4.8)
   1.4.9

> Consolidate warning on SecureBulkLoad directory permissions
> ---
>
> Key: HBASE-20694
> URL: https://issues.apache.org/jira/browse/HBASE-20694
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 2.2.0, 1.4.9
>
>
> Follow-on from HBASE-20605:
> HBase 1.x has a check which ignores a directory permission check if you're 
> using a specific filesystem which we think doesnt' do security properly.
> HBase 2.x dropped this check.
> Since the security of bulk-loaded data is dependent upon this directory 
> permission (and thus the capabilities of the FileSystem), it would be better 
> to have a consistent warning across branches.
> [~busbey] suggested that we make a WARN message which points admins to our 
> Book (and write such a section if we don't have something sufficient already) 
> and our supported filesystems, coupled with an option to disable that warning 
> message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21013) Backport "read part" of HBASE-18754 to all active 1.x branches

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21013:
---
Fix Version/s: (was: 1.4.8)
   1.4.9

> Backport "read part" of HBASE-18754 to all active 1.x branches
> --
>
> Key: HBASE-21013
> URL: https://issues.apache.org/jira/browse/HBASE-21013
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Chia-Ping Tsai
>Assignee: Mingdao Yang
>Priority: Critical
> Fix For: 1.5.0, 1.3.3, 1.2.8, 1.4.9
>
>
> The hfiles impacted by HBASE-18754 will have bytes of proto.TimeRangeTracker. 
> It makes all 1.x branches failed to read the hfile since all 1.x branches 
> can't deserialize the proto.TimeRangeTracker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21261) Add log4j.properties for hbase-rsgroup tests

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-21261.

   Resolution: Fixed
Fix Version/s: 2.0.3
   2.1.1
   1.4.8
   2.2.0
   1.5.0
   3.0.0

> Add log4j.properties for hbase-rsgroup tests
> 
>
> Key: HBASE-21261
> URL: https://issues.apache.org/jira/browse/HBASE-21261
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Andrew Purtell
>Priority: Trivial
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1, 2.0.3
>
>
> When I tried to debug TestRSGroups, at first I couldn't find any DEBUG log.
> Turns out that under hbase-rsgroup/src/test/resources there is no 
> log4j.properties
> This issue adds log4j.properties for hbase-rsgroup tests.
> This would be useful when finding root cause for hbase-rsgroup test 
> failure(s).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-21261) Add log4j.properties for hbase-rsgroup tests

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reassigned HBASE-21261:
--

Assignee: Andrew Purtell

> Add log4j.properties for hbase-rsgroup tests
> 
>
> Key: HBASE-21261
> URL: https://issues.apache.org/jira/browse/HBASE-21261
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Andrew Purtell
>Priority: Minor
>
> When I tried to debug TestRSGroups, at first I couldn't find any DEBUG log.
> Turns out that under hbase-rsgroup/src/test/resources there is no 
> log4j.properties
> This issue adds log4j.properties for hbase-rsgroup tests.
> This would be useful when finding root cause for hbase-rsgroup test 
> failure(s).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21261) Add log4j.properties for hbase-rsgroup tests

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21261:
---
Priority: Trivial  (was: Minor)

Trivial test change, let me make it now

> Add log4j.properties for hbase-rsgroup tests
> 
>
> Key: HBASE-21261
> URL: https://issues.apache.org/jira/browse/HBASE-21261
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Andrew Purtell
>Priority: Trivial
>
> When I tried to debug TestRSGroups, at first I couldn't find any DEBUG log.
> Turns out that under hbase-rsgroup/src/test/resources there is no 
> log4j.properties
> This issue adds log4j.properties for hbase-rsgroup tests.
> This would be useful when finding root cause for hbase-rsgroup test 
> failure(s).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19275) TestSnapshotFileCache never worked properly

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-19275:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.0.3
   2.1.1
   1.4.8
   2.2.0
   1.5.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Thanks for fixing this test [~xucang]

> TestSnapshotFileCache never worked properly
> ---
>
> Key: HBASE-19275
> URL: https://issues.apache.org/jira/browse/HBASE-19275
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 1.4.0, 1.5.0, 2.0.0
>Reporter: Andrew Purtell
>Assignee: Xu Cang
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8, 2.1.1, 2.0.3
>
> Attachments: HBASE-19275-branch-1.patch, 
> HBASE-19275-master.001.patch, HBASE-19275-master.001.patch
>
>
> Error-prone noticed we were asking Iterables.contains() questions with the 
> wrong type in TestSnapshotFileCache. I've attached a fixed version of the 
> test. The results suggest the cache is not evicting entries properly. 
> {noformat}
> java.lang.AssertionError: Cache found 
> 'hdfs://localhost:52867/user/apurtell/test-data/8ce04c85-ce4b-4844-b454-5303482ade95/data/default/snapshot1/9e49edd0ab41657fb0c6ebb4d9dfad15/cf/f132e5b06f66443f8003363ed1535aac',
>  but it shouldn't have.
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at 
> org.apache.hadoop.hbase.master.snapshot.TestSnapshotFileCache.createAndTestSnapshot(TestSnapshotFileCache.java:260)
>   at 
> org.apache.hadoop.hbase.master.snapshot.TestSnapshotFileCache.createAndTestSnapshotV1(TestSnapshotFileCache.java:206)
>   at 
> org.apache.hadoop.hbase.master.snapshot.TestSnapshotFileCache.testReloadModifiedDirectory(TestSnapshotFileCache.java:102)
> {noformat}
> {noformat}
> java.lang.AssertionError: Cache found 
> 'hdfs://localhost:52867/user/apurtell/test-data/8ce04c85-ce4b-4844-b454-5303482ade95/data/default/snapshot1a/2e81adb9212c98cff970eafa006fc40b/cf/a2ec478d850e4e348359699c53b732c4',
>  but it shouldn't have.
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at 
> org.apache.hadoop.hbase.master.snapshot.TestSnapshotFileCache.createAndTestSnapshot(TestSnapshotFileCache.java:260)
>   at 
> org.apache.hadoop.hbase.master.snapshot.TestSnapshotFileCache.createAndTestSnapshotV1(TestSnapshotFileCache.java:206)
>   at 
> org.apache.hadoop.hbase.master.snapshot.TestSnapshotFileCache.testLoadAndDelete(TestSnapshotFileCache.java:88)
> {noformat}
> These changes are part of HBASE-19239
> I've disabled the offending test cases with @Ignore in that patch, but they 
> should be reenabled and fixed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21258) Add resetting of flags for RS Group pre/post hooks in TestRSGroups

2018-10-01 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634768#comment-16634768
 ] 

Andrew Purtell edited comment on HBASE-21258 at 10/2/18 12:07 AM:
--

I seem to have looked at "21258.v1.txt" but this was not what was proposed for 
commit to branch-1 or what was committed there. I have objection to what was 
committed. It in effect clones TestRSGroups, but ignores all of its tests, and 
adds two more. This isn't the way it should be done. Shouldn't have to explain 
why, but if you need it, just consider what ignoring 22 units looks like in 
junit output... unnecessarily suspicious. Also, if anyone adds a test to 
TestRSGroups but fails to add an @ignore in TestRSGroups1 whoops the new test 
runs twice. We don't need this "TestRSGroups1" thing anyway. I will replace the 
branch-1 commit with what I expected, a port of 21258.v1.txt to branch-1. 


was (Author: apurtell):
I seem to have looked at "21258.v1.txt" but this was not what was proposed for 
commit to branch-1 or what was committed there. I have objection to what was 
committed. It in effect clones TestRSGroups, but ignores all of its tests, and 
adds two more. This isn't the way it should be done. Shouldn't have to explain 
why, but if you need it, just consider what ignoring 22 units looks like in 
junit output... unnecessarily, suspicious. We don't need this "TestRSGroups1" 
thing anyway. I will replace the branch-1 commit with what I expected, a port 
of 21258.v1.txt to branch-1. 

> Add resetting of flags for RS Group pre/post hooks in TestRSGroups
> --
>
> Key: HBASE-21258
> URL: https://issues.apache.org/jira/browse/HBASE-21258
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8
>
> Attachments: 21258.branch-1.04.txt, 21258.branch-1.05.txt, 
> 21258.branch-2.v1.patch, 21258.v1.txt
>
>
> Over HBASE-20627, [~xucang] reminded me that the resetting of flags for RS 
> Group pre/post hooks in TestRSGroups was absent.
> This issue is to add the resetting of these flags before each subtest starts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21117) Backport HBASE-18350 (fix RSGroups) to branch-1 (Only port the part fixing table locking issue.)

2018-10-01 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634791#comment-16634791
 ] 

Andrew Purtell commented on HBASE-21117:


Proceeding with commit

> Backport HBASE-18350  (fix RSGroups)  to branch-1 (Only port the part fixing 
> table locking issue.)
> --
>
> Key: HBASE-21117
> URL: https://issues.apache.org/jira/browse/HBASE-21117
> Project: HBase
>  Issue Type: Bug
>  Components: backport, rsgroup, shell
>Affects Versions: 1.3.2
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Major
>  Labels: backport
> Fix For: 1.5.0, 1.4.8
>
> Attachments: HBASE-21117-branch-1.001.patch, 
> HBASE-21117-branch-1.002.patch
>
>
> When working on HBASE-20666, I found out HBASE-18350 did not get ported to 
> branch-1, which causes procedure to hang when #moveTables called sometimes. 
> After looking into the 18350 patch, seems it's important since it fixes 4 
> issues. This Jira is an attempt to backport it to branch-1.
>  
>  
> Edited: Aug26.
> After reviewed the HBASE-18350 patch. I decided to only port part 2 of the 
> patch.
> Because part1 and part3 is AMv2 related. I won't touch is since Amv2 is only 
> for branch-2
>  
> {quote} 
> Subject: [PATCH] HBASE-18350 RSGroups are broken under AMv2
> - Table moving to RSG was buggy, because it left the table unassigned.
>   Now it is fixed we immediately assign to an appropriate RS
>   (MoveRegionProcedure).
> *- Table was locked while moving, but unassign operation hung, because*
>   *locked table queues are not scheduled while locked. Fixed.    port 
> this one.*
> - ProcedureSyncWait was buggy, because it searched the procId in
>   executor, but executor does not store the return values of internal
>   operations (they are stored, but immediately removed by the cleaner).
> - list_rsgroups in the shell show also the assigned tables and servers.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21117) Backport HBASE-18350 (fix RSGroups) to branch-1 (Only port the part fixing table locking issue.)

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-21117.

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 1.4.8
   1.5.0

> Backport HBASE-18350  (fix RSGroups)  to branch-1 (Only port the part fixing 
> table locking issue.)
> --
>
> Key: HBASE-21117
> URL: https://issues.apache.org/jira/browse/HBASE-21117
> Project: HBase
>  Issue Type: Bug
>  Components: backport, rsgroup, shell
>Affects Versions: 1.3.2
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Major
>  Labels: backport
> Fix For: 1.5.0, 1.4.8
>
> Attachments: HBASE-21117-branch-1.001.patch, 
> HBASE-21117-branch-1.002.patch
>
>
> When working on HBASE-20666, I found out HBASE-18350 did not get ported to 
> branch-1, which causes procedure to hang when #moveTables called sometimes. 
> After looking into the 18350 patch, seems it's important since it fixes 4 
> issues. This Jira is an attempt to backport it to branch-1.
>  
>  
> Edited: Aug26.
> After reviewed the HBASE-18350 patch. I decided to only port part 2 of the 
> patch.
> Because part1 and part3 is AMv2 related. I won't touch is since Amv2 is only 
> for branch-2
>  
> {quote} 
> Subject: [PATCH] HBASE-18350 RSGroups are broken under AMv2
> - Table moving to RSG was buggy, because it left the table unassigned.
>   Now it is fixed we immediately assign to an appropriate RS
>   (MoveRegionProcedure).
> *- Table was locked while moving, but unassign operation hung, because*
>   *locked table queues are not scheduled while locked. Fixed.    port 
> this one.*
> - ProcedureSyncWait was buggy, because it searched the procId in
>   executor, but executor does not store the return values of internal
>   operations (they are stored, but immediately removed by the cleaner).
> - list_rsgroups in the shell show also the assigned tables and servers.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21258) Add resetting of flags for RS Group pre/post hooks in TestRSGroups

2018-10-01 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634790#comment-16634790
 ] 

Andrew Purtell commented on HBASE-21258:


I committed 21258.v1.txt amended with the removal of a handful of asserts that 
have not been testing what they think they are testing, as revealed by the 
21258.v1.txt change. These asserts are incidental to what is being tested by 
the respective units. I'm not happy with this but more substantial changes 
should have a separate followup. TestRSGroups should be rewritten to avoid 
catch-all initialization and cleanup steps in @before and @after methods. It 
also has a running time of 230 seconds. Could stand to be split up 8 ways (or 
more). 

This unblocks HBASE-21117, proceeding there. 

> Add resetting of flags for RS Group pre/post hooks in TestRSGroups
> --
>
> Key: HBASE-21258
> URL: https://issues.apache.org/jira/browse/HBASE-21258
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8
>
> Attachments: 21258.branch-1.04.txt, 21258.branch-1.05.txt, 
> 21258.branch-2.v1.patch, 21258.v1.txt
>
>
> Over HBASE-20627, [~xucang] reminded me that the resetting of flags for RS 
> Group pre/post hooks in TestRSGroups was absent.
> This issue is to add the resetting of these flags before each subtest starts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21258) Add resetting of flags for RS Group pre/post hooks in TestRSGroups

2018-10-01 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634776#comment-16634776
 ] 

Andrew Purtell commented on HBASE-21258:


Yes a small modification is needed. I haven't committed anything yet. 

> Add resetting of flags for RS Group pre/post hooks in TestRSGroups
> --
>
> Key: HBASE-21258
> URL: https://issues.apache.org/jira/browse/HBASE-21258
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8
>
> Attachments: 21258.branch-1.04.txt, 21258.branch-1.05.txt, 
> 21258.branch-2.v1.patch, 21258.v1.txt
>
>
> Over HBASE-20627, [~xucang] reminded me that the resetting of flags for RS 
> Group pre/post hooks in TestRSGroups was absent.
> This issue is to add the resetting of these flags before each subtest starts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21258) Add resetting of flags for RS Group pre/post hooks in TestRSGroups

2018-10-01 Thread Xu Cang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634775#comment-16634775
 ] 

Xu Cang commented on HBASE-21258:
-

[~apurtell]

If you apply ported 21258.v1.txt to branch-1, I think there is going to be a 
test failure.

> Add resetting of flags for RS Group pre/post hooks in TestRSGroups
> --
>
> Key: HBASE-21258
> URL: https://issues.apache.org/jira/browse/HBASE-21258
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8
>
> Attachments: 21258.branch-1.04.txt, 21258.branch-1.05.txt, 
> 21258.branch-2.v1.patch, 21258.v1.txt
>
>
> Over HBASE-20627, [~xucang] reminded me that the resetting of flags for RS 
> Group pre/post hooks in TestRSGroups was absent.
> This issue is to add the resetting of these flags before each subtest starts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21258) Add resetting of flags for RS Group pre/post hooks in TestRSGroups

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-21258.

   Resolution: Fixed
Fix Version/s: 1.4.8
   1.5.0

The branch-2 patch applies without any changes needed. Resolving this as fixed. 
If additional changes are needed, let's open a new issue not do something 
radical with a branch-1 patch.

> Add resetting of flags for RS Group pre/post hooks in TestRSGroups
> --
>
> Key: HBASE-21258
> URL: https://issues.apache.org/jira/browse/HBASE-21258
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8
>
> Attachments: 21258.branch-1.04.txt, 21258.branch-1.05.txt, 
> 21258.branch-2.v1.patch, 21258.v1.txt
>
>
> Over HBASE-20627, [~xucang] reminded me that the resetting of flags for RS 
> Group pre/post hooks in TestRSGroups was absent.
> This issue is to add the resetting of these flags before each subtest starts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21258) Add resetting of flags for RS Group pre/post hooks in TestRSGroups

2018-10-01 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634768#comment-16634768
 ] 

Andrew Purtell commented on HBASE-21258:


I seem to have looked at "21258.v1.txt" but this was not what was proposed for 
commit to branch-1 or what was committed there. I have objection to what was 
committed. It in effect clones TestRSGroups, but ignores all of its tests, and 
adds two more. This isn't the way it should be done. Shouldn't have to explain 
why, but if you need it, just consider what ignoring 22 units looks like in 
junit output... unnecessarily, suspicious. We don't need this "TestRSGroups1" 
thing anyway. I will replace the branch-1 commit with what I expected, a port 
of 21258.v1.txt to branch-1. 

> Add resetting of flags for RS Group pre/post hooks in TestRSGroups
> --
>
> Key: HBASE-21258
> URL: https://issues.apache.org/jira/browse/HBASE-21258
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: 21258.branch-1.04.txt, 21258.branch-1.05.txt, 
> 21258.branch-2.v1.patch, 21258.v1.txt
>
>
> Over HBASE-20627, [~xucang] reminded me that the resetting of flags for RS 
> Group pre/post hooks in TestRSGroups was absent.
> This issue is to add the resetting of these flags before each subtest starts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-21258) Add resetting of flags for RS Group pre/post hooks in TestRSGroups

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reopened HBASE-21258:


Pardon me, there has been a review error. Reopening because I'm reverting what 
was committed to branch-1. 

> Add resetting of flags for RS Group pre/post hooks in TestRSGroups
> --
>
> Key: HBASE-21258
> URL: https://issues.apache.org/jira/browse/HBASE-21258
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: 21258.branch-1.04.txt, 21258.branch-1.05.txt, 
> 21258.branch-2.v1.patch, 21258.v1.txt
>
>
> Over HBASE-20627, [~xucang] reminded me that the resetting of flags for RS 
> Group pre/post hooks in TestRSGroups was absent.
> This issue is to add the resetting of these flags before each subtest starts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21258) Add resetting of flags for RS Group pre/post hooks in TestRSGroups

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21258:
---
Fix Version/s: (was: 1.4.8)
   (was: 1.5.0)

> Add resetting of flags for RS Group pre/post hooks in TestRSGroups
> --
>
> Key: HBASE-21258
> URL: https://issues.apache.org/jira/browse/HBASE-21258
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: 21258.branch-1.04.txt, 21258.branch-1.05.txt, 
> 21258.branch-2.v1.patch, 21258.v1.txt
>
>
> Over HBASE-20627, [~xucang] reminded me that the resetting of flags for RS 
> Group pre/post hooks in TestRSGroups was absent.
> This issue is to add the resetting of these flags before each subtest starts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21258) Add resetting of flags for RS Group pre/post hooks in TestRSGroups

2018-10-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-21258:
---
Fix Version/s: 1.4.8

> Add resetting of flags for RS Group pre/post hooks in TestRSGroups
> --
>
> Key: HBASE-21258
> URL: https://issues.apache.org/jira/browse/HBASE-21258
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8
>
> Attachments: 21258.branch-1.04.txt, 21258.branch-1.05.txt, 
> 21258.branch-2.v1.patch, 21258.v1.txt
>
>
> Over HBASE-20627, [~xucang] reminded me that the resetting of flags for RS 
> Group pre/post hooks in TestRSGroups was absent.
> This issue is to add the resetting of these flags before each subtest starts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations

2018-10-01 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634744#comment-16634744
 ] 

Ted Yu edited comment on HBASE-21221 at 10/1/18 11:05 PM:
--

I noticed that the current test would pass even if MultiRowMutationEndpoint is 
not registered.
In the test output:
{code}
2018-10-01 15:55:15,749 DEBUG [hconnection-0x589a90eb-shared-pool13-t1] 
client.TestFromClientSide3(855): 
org.apache.hadoop.hbase.exceptions.UnknownProtocolException: 
org.apache.hadoop.hbase.exceptions.UnknownProtocolException: No registered 
coprocessor service found for MultiRowMutationService in region 
testMultiRowMutations,,1538434514918.8d59d9ae0e4652161a3048075502367a.
  at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8223)
  at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2484)
  at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2466)
  at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42010)
  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
  at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
  at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
{code}
Attaching an addendum to exclude this scenario.


was (Author: yuzhih...@gmail.com):
I noticed that the current test would pass even if MultiRowMutationEndpoint is 
not registered.
In the test output:
{code}
2018-10-01 15:55:15,749 DEBUG [hconnection-0x589a90eb-shared-pool13-t1] 
client.TestFromClientSide3(855):  ted
org.apache.hadoop.hbase.exceptions.UnknownProtocolException: 
org.apache.hadoop.hbase.exceptions.UnknownProtocolException: No registered 
coprocessor service found for MultiRowMutationService in region 
testMultiRowMutations,,1538434514918.8d59d9ae0e4652161a3048075502367a.
  at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8223)
  at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2484)
  at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2466)
  at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42010)
  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
  at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
  at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
{code}
Attaching an addendum to exclude this scenario.

> Ineffective assertion in TestFromClientSide3#testMultiRowMutations
> --
>
> Key: HBASE-21221
> URL: https://issues.apache.org/jira/browse/HBASE-21221
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: 21221.addendum.txt, 21221.v10.txt, 21221.v11.txt, 
> 21221.v12.txt, 21221.v7.txt, 21221.v8.txt, 21221.v9.txt
>
>
> Observed the following in 
> org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe-output.txt :
> {code}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
> java.io.IOException: Timed out waiting for lock for row: ROW-1 in region 
> 089bdfa75f44d88e596479038a6da18b
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5816)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$4.lockRowsAndBuildMiniBatch(HRegion.java:7432)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4008)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3982)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:7424)
>   at 
> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint.mutateRows(MultiRowMutationEndpoint.java:116)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MultiRowMutationProtos$MultiRowMutationService.callMethod(MultiRowMutationProtos.java:2266)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8182)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2481)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2463)
> ...
> Exception in thread "pool-678-thread-1" java.lang.AssertionError: This cp 
> should fail because the target lock is blocked by previous put
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> 

[jira] [Updated] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations

2018-10-01 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21221:
---
Status: Patch Available  (was: Reopened)

> Ineffective assertion in TestFromClientSide3#testMultiRowMutations
> --
>
> Key: HBASE-21221
> URL: https://issues.apache.org/jira/browse/HBASE-21221
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: 21221.addendum.txt, 21221.v10.txt, 21221.v11.txt, 
> 21221.v12.txt, 21221.v7.txt, 21221.v8.txt, 21221.v9.txt
>
>
> Observed the following in 
> org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe-output.txt :
> {code}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
> java.io.IOException: Timed out waiting for lock for row: ROW-1 in region 
> 089bdfa75f44d88e596479038a6da18b
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5816)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$4.lockRowsAndBuildMiniBatch(HRegion.java:7432)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4008)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3982)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:7424)
>   at 
> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint.mutateRows(MultiRowMutationEndpoint.java:116)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MultiRowMutationProtos$MultiRowMutationService.callMethod(MultiRowMutationProtos.java:2266)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8182)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2481)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2463)
> ...
> Exception in thread "pool-678-thread-1" java.lang.AssertionError: This cp 
> should fail because the target lock is blocked by previous put
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.lambda$testMultiRowMutations$7(TestFromClientSide3.java:861)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}
> Here is related code:
> {code}
>   cpService.execute(() -> {
> ...
> if (!threw) {
>   // Can't call fail() earlier because the catch would eat it.
>   fail("This cp should fail because the target lock is blocked by 
> previous put");
> }
> {code}
> Since the fail() call is executed by the cpService, the assertion had no 
> bearing on the outcome of the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations

2018-10-01 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-21221:


> Ineffective assertion in TestFromClientSide3#testMultiRowMutations
> --
>
> Key: HBASE-21221
> URL: https://issues.apache.org/jira/browse/HBASE-21221
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: 21221.addendum.txt, 21221.v10.txt, 21221.v11.txt, 
> 21221.v12.txt, 21221.v7.txt, 21221.v8.txt, 21221.v9.txt
>
>
> Observed the following in 
> org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe-output.txt :
> {code}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
> java.io.IOException: Timed out waiting for lock for row: ROW-1 in region 
> 089bdfa75f44d88e596479038a6da18b
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5816)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$4.lockRowsAndBuildMiniBatch(HRegion.java:7432)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4008)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3982)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:7424)
>   at 
> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint.mutateRows(MultiRowMutationEndpoint.java:116)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MultiRowMutationProtos$MultiRowMutationService.callMethod(MultiRowMutationProtos.java:2266)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8182)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2481)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2463)
> ...
> Exception in thread "pool-678-thread-1" java.lang.AssertionError: This cp 
> should fail because the target lock is blocked by previous put
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.lambda$testMultiRowMutations$7(TestFromClientSide3.java:861)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}
> Here is related code:
> {code}
>   cpService.execute(() -> {
> ...
> if (!threw) {
>   // Can't call fail() earlier because the catch would eat it.
>   fail("This cp should fail because the target lock is blocked by 
> previous put");
> }
> {code}
> Since the fail() call is executed by the cpService, the assertion had no 
> bearing on the outcome of the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations

2018-10-01 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21221:
---
Attachment: 21221.addendum.txt

> Ineffective assertion in TestFromClientSide3#testMultiRowMutations
> --
>
> Key: HBASE-21221
> URL: https://issues.apache.org/jira/browse/HBASE-21221
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: 21221.addendum.txt, 21221.v10.txt, 21221.v11.txt, 
> 21221.v12.txt, 21221.v7.txt, 21221.v8.txt, 21221.v9.txt
>
>
> Observed the following in 
> org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe-output.txt :
> {code}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
> java.io.IOException: Timed out waiting for lock for row: ROW-1 in region 
> 089bdfa75f44d88e596479038a6da18b
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5816)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$4.lockRowsAndBuildMiniBatch(HRegion.java:7432)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4008)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3982)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:7424)
>   at 
> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint.mutateRows(MultiRowMutationEndpoint.java:116)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MultiRowMutationProtos$MultiRowMutationService.callMethod(MultiRowMutationProtos.java:2266)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8182)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2481)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2463)
> ...
> Exception in thread "pool-678-thread-1" java.lang.AssertionError: This cp 
> should fail because the target lock is blocked by previous put
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.lambda$testMultiRowMutations$7(TestFromClientSide3.java:861)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}
> Here is related code:
> {code}
>   cpService.execute(() -> {
> ...
> if (!threw) {
>   // Can't call fail() earlier because the catch would eat it.
>   fail("This cp should fail because the target lock is blocked by 
> previous put");
> }
> {code}
> Since the fail() call is executed by the cpService, the assertion had no 
> bearing on the outcome of the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations

2018-10-01 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634744#comment-16634744
 ] 

Ted Yu commented on HBASE-21221:


I noticed that the current test would pass even if MultiRowMutationEndpoint is 
not registered.
In the test output:
{code}
2018-10-01 15:55:15,749 DEBUG [hconnection-0x589a90eb-shared-pool13-t1] 
client.TestFromClientSide3(855):  ted
org.apache.hadoop.hbase.exceptions.UnknownProtocolException: 
org.apache.hadoop.hbase.exceptions.UnknownProtocolException: No registered 
coprocessor service found for MultiRowMutationService in region 
testMultiRowMutations,,1538434514918.8d59d9ae0e4652161a3048075502367a.
  at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8223)
  at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2484)
  at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2466)
  at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42010)
  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
  at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
  at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
{code}
Attaching an addendum to exclude this scenario.

> Ineffective assertion in TestFromClientSide3#testMultiRowMutations
> --
>
> Key: HBASE-21221
> URL: https://issues.apache.org/jira/browse/HBASE-21221
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: 21221.v10.txt, 21221.v11.txt, 21221.v12.txt, 
> 21221.v7.txt, 21221.v8.txt, 21221.v9.txt
>
>
> Observed the following in 
> org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe-output.txt :
> {code}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
> java.io.IOException: Timed out waiting for lock for row: ROW-1 in region 
> 089bdfa75f44d88e596479038a6da18b
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5816)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$4.lockRowsAndBuildMiniBatch(HRegion.java:7432)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4008)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3982)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:7424)
>   at 
> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint.mutateRows(MultiRowMutationEndpoint.java:116)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MultiRowMutationProtos$MultiRowMutationService.callMethod(MultiRowMutationProtos.java:2266)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8182)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2481)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2463)
> ...
> Exception in thread "pool-678-thread-1" java.lang.AssertionError: This cp 
> should fail because the target lock is blocked by previous put
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.lambda$testMultiRowMutations$7(TestFromClientSide3.java:861)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}
> Here is related code:
> {code}
>   cpService.execute(() -> {
> ...
> if (!threw) {
>   // Can't call fail() earlier because the catch would eat it.
>   fail("This cp should fail because the target lock is blocked by 
> previous put");
> }
> {code}
> Since the fail() call is executed by the cpService, the assertion had no 
> bearing on the outcome of the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21117) Backport HBASE-18350 (fix RSGroups) to branch-1 (Only port the part fixing table locking issue.)

2018-10-01 Thread Xu Cang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634735#comment-16634735
 ] 

Xu Cang commented on HBASE-21117:
-

[~apurtell]
002 patch is good to go. Just tested locally, all RSGroups tests passed. 

> Backport HBASE-18350  (fix RSGroups)  to branch-1 (Only port the part fixing 
> table locking issue.)
> --
>
> Key: HBASE-21117
> URL: https://issues.apache.org/jira/browse/HBASE-21117
> Project: HBase
>  Issue Type: Bug
>  Components: backport, rsgroup, shell
>Affects Versions: 1.3.2
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Major
>  Labels: backport
> Attachments: HBASE-21117-branch-1.001.patch, 
> HBASE-21117-branch-1.002.patch
>
>
> When working on HBASE-20666, I found out HBASE-18350 did not get ported to 
> branch-1, which causes procedure to hang when #moveTables called sometimes. 
> After looking into the 18350 patch, seems it's important since it fixes 4 
> issues. This Jira is an attempt to backport it to branch-1.
>  
>  
> Edited: Aug26.
> After reviewed the HBASE-18350 patch. I decided to only port part 2 of the 
> patch.
> Because part1 and part3 is AMv2 related. I won't touch is since Amv2 is only 
> for branch-2
>  
> {quote} 
> Subject: [PATCH] HBASE-18350 RSGroups are broken under AMv2
> - Table moving to RSG was buggy, because it left the table unassigned.
>   Now it is fixed we immediately assign to an appropriate RS
>   (MoveRegionProcedure).
> *- Table was locked while moving, but unassign operation hung, because*
>   *locked table queues are not scheduled while locked. Fixed.    port 
> this one.*
> - ProcedureSyncWait was buggy, because it searched the procId in
>   executor, but executor does not store the return values of internal
>   operations (they are stored, but immediately removed by the cleaner).
> - list_rsgroups in the shell show also the assigned tables and servers.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21258) Add resetting of flags for RS Group pre/post hooks in TestRSGroups

2018-10-01 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21258:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.2.0
   1.5.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Thanks for the reviews.

> Add resetting of flags for RS Group pre/post hooks in TestRSGroups
> --
>
> Key: HBASE-21258
> URL: https://issues.apache.org/jira/browse/HBASE-21258
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: 21258.branch-1.04.txt, 21258.branch-1.05.txt, 
> 21258.branch-2.v1.patch, 21258.v1.txt
>
>
> Over HBASE-20627, [~xucang] reminded me that the resetting of flags for RS 
> Group pre/post hooks in TestRSGroups was absent.
> This issue is to add the resetting of these flags before each subtest starts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21117) Backport HBASE-18350 (fix RSGroups) to branch-1 (Only port the part fixing table locking issue.)

2018-10-01 Thread Xu Cang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634687#comment-16634687
 ] 

Xu Cang commented on HBASE-21117:
-

Yes, I tried to apply  .002 patch based on  HBASE-21258 v5 patch. It works 
well, unit tests all pass.
Let me double verify before notifying this patch is fully ready after Ted 
pushed his.

> Backport HBASE-18350  (fix RSGroups)  to branch-1 (Only port the part fixing 
> table locking issue.)
> --
>
> Key: HBASE-21117
> URL: https://issues.apache.org/jira/browse/HBASE-21117
> Project: HBase
>  Issue Type: Bug
>  Components: backport, rsgroup, shell
>Affects Versions: 1.3.2
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Major
>  Labels: backport
> Attachments: HBASE-21117-branch-1.001.patch, 
> HBASE-21117-branch-1.002.patch
>
>
> When working on HBASE-20666, I found out HBASE-18350 did not get ported to 
> branch-1, which causes procedure to hang when #moveTables called sometimes. 
> After looking into the 18350 patch, seems it's important since it fixes 4 
> issues. This Jira is an attempt to backport it to branch-1.
>  
>  
> Edited: Aug26.
> After reviewed the HBASE-18350 patch. I decided to only port part 2 of the 
> patch.
> Because part1 and part3 is AMv2 related. I won't touch is since Amv2 is only 
> for branch-2
>  
> {quote} 
> Subject: [PATCH] HBASE-18350 RSGroups are broken under AMv2
> - Table moving to RSG was buggy, because it left the table unassigned.
>   Now it is fixed we immediately assign to an appropriate RS
>   (MoveRegionProcedure).
> *- Table was locked while moving, but unassign operation hung, because*
>   *locked table queues are not scheduled while locked. Fixed.    port 
> this one.*
> - ProcedureSyncWait was buggy, because it searched the procId in
>   executor, but executor does not store the return values of internal
>   operations (they are stored, but immediately removed by the cleaner).
> - list_rsgroups in the shell show also the assigned tables and servers.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21117) Backport HBASE-18350 (fix RSGroups) to branch-1 (Only port the part fixing table locking issue.)

2018-10-01 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634683#comment-16634683
 ] 

Andrew Purtell commented on HBASE-21117:


Thanks [~xucang], looks like HBASE-21258 will be committed soon by Ted

> Backport HBASE-18350  (fix RSGroups)  to branch-1 (Only port the part fixing 
> table locking issue.)
> --
>
> Key: HBASE-21117
> URL: https://issues.apache.org/jira/browse/HBASE-21117
> Project: HBase
>  Issue Type: Bug
>  Components: backport, rsgroup, shell
>Affects Versions: 1.3.2
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Major
>  Labels: backport
> Attachments: HBASE-21117-branch-1.001.patch, 
> HBASE-21117-branch-1.002.patch
>
>
> When working on HBASE-20666, I found out HBASE-18350 did not get ported to 
> branch-1, which causes procedure to hang when #moveTables called sometimes. 
> After looking into the 18350 patch, seems it's important since it fixes 4 
> issues. This Jira is an attempt to backport it to branch-1.
>  
>  
> Edited: Aug26.
> After reviewed the HBASE-18350 patch. I decided to only port part 2 of the 
> patch.
> Because part1 and part3 is AMv2 related. I won't touch is since Amv2 is only 
> for branch-2
>  
> {quote} 
> Subject: [PATCH] HBASE-18350 RSGroups are broken under AMv2
> - Table moving to RSG was buggy, because it left the table unassigned.
>   Now it is fixed we immediately assign to an appropriate RS
>   (MoveRegionProcedure).
> *- Table was locked while moving, but unassign operation hung, because*
>   *locked table queues are not scheduled while locked. Fixed.    port 
> this one.*
> - ProcedureSyncWait was buggy, because it searched the procId in
>   executor, but executor does not store the return values of internal
>   operations (they are stored, but immediately removed by the cleaner).
> - list_rsgroups in the shell show also the assigned tables and servers.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21258) Add resetting of flags for RS Group pre/post hooks in TestRSGroups

2018-10-01 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634682#comment-16634682
 ] 

Andrew Purtell commented on HBASE-21258:


+1


> Add resetting of flags for RS Group pre/post hooks in TestRSGroups
> --
>
> Key: HBASE-21258
> URL: https://issues.apache.org/jira/browse/HBASE-21258
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Major
> Attachments: 21258.branch-1.04.txt, 21258.branch-1.05.txt, 
> 21258.branch-2.v1.patch, 21258.v1.txt
>
>
> Over HBASE-20627, [~xucang] reminded me that the resetting of flags for RS 
> Group pre/post hooks in TestRSGroups was absent.
> This issue is to add the resetting of these flags before each subtest starts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19275) TestSnapshotFileCache never worked properly

2018-10-01 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-19275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634678#comment-16634678
 ] 

Andrew Purtell commented on HBASE-19275:


+1
thanks [~xucang] will look at committing this today

> TestSnapshotFileCache never worked properly
> ---
>
> Key: HBASE-19275
> URL: https://issues.apache.org/jira/browse/HBASE-19275
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 1.4.0, 1.5.0, 2.0.0
>Reporter: Andrew Purtell
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-19275-branch-1.patch, 
> HBASE-19275-master.001.patch, HBASE-19275-master.001.patch
>
>
> Error-prone noticed we were asking Iterables.contains() questions with the 
> wrong type in TestSnapshotFileCache. I've attached a fixed version of the 
> test. The results suggest the cache is not evicting entries properly. 
> {noformat}
> java.lang.AssertionError: Cache found 
> 'hdfs://localhost:52867/user/apurtell/test-data/8ce04c85-ce4b-4844-b454-5303482ade95/data/default/snapshot1/9e49edd0ab41657fb0c6ebb4d9dfad15/cf/f132e5b06f66443f8003363ed1535aac',
>  but it shouldn't have.
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at 
> org.apache.hadoop.hbase.master.snapshot.TestSnapshotFileCache.createAndTestSnapshot(TestSnapshotFileCache.java:260)
>   at 
> org.apache.hadoop.hbase.master.snapshot.TestSnapshotFileCache.createAndTestSnapshotV1(TestSnapshotFileCache.java:206)
>   at 
> org.apache.hadoop.hbase.master.snapshot.TestSnapshotFileCache.testReloadModifiedDirectory(TestSnapshotFileCache.java:102)
> {noformat}
> {noformat}
> java.lang.AssertionError: Cache found 
> 'hdfs://localhost:52867/user/apurtell/test-data/8ce04c85-ce4b-4844-b454-5303482ade95/data/default/snapshot1a/2e81adb9212c98cff970eafa006fc40b/cf/a2ec478d850e4e348359699c53b732c4',
>  but it shouldn't have.
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at 
> org.apache.hadoop.hbase.master.snapshot.TestSnapshotFileCache.createAndTestSnapshot(TestSnapshotFileCache.java:260)
>   at 
> org.apache.hadoop.hbase.master.snapshot.TestSnapshotFileCache.createAndTestSnapshotV1(TestSnapshotFileCache.java:206)
>   at 
> org.apache.hadoop.hbase.master.snapshot.TestSnapshotFileCache.testLoadAndDelete(TestSnapshotFileCache.java:88)
> {noformat}
> These changes are part of HBASE-19239
> I've disabled the offending test cases with @Ignore in that patch, but they 
> should be reenabled and fixed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21259) [amv2] Revived deadservers; recreated serverstatenode

2018-10-01 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634569#comment-16634569
 ] 

stack edited comment on HBASE-21259 at 10/1/18 8:28 PM:


Scenario is this:

 * meta has regions that reference a regionserver that is long gone. It was 
processed (or not if all MasterProcWALs have been removed) many restarts ago.
 * The table is borked. Some regions are not unassigned though their table is.
 * We run a mass unassign. Because table has many unassigned regions, it takes 
a while.
 * The first unassign queues a SCP for the long-dead server. It quickly runs 
through the SCP and finishes.. no logs to split.
 * Soon after, another scheduled unassign for same server is run. It queues an 
SCP (remember, if the unassign is against a server that is not online, we queue 
SCP and then wait on the SCP to wake the unassign so we do proper unassign 
cleanup in the handleRIT callback)... only in this case, the server is in the 
deadserver list and has been processed so this last assign just hangs for 
ever because the check for server state creates a new serverstatenode and new 
serverstatenodes default ONLINE.

It is sort of wonky and not 'usual' but I've been trashing my cluster and then 
trying to repair with hbck2. This is how I run into the odd state reported 
above.  In particular, on start, the load of meta will put all regions into 
RIT. If no online server associated, then the regions are considered STUCK. I 
then do a bulk assign or unassign of the OPENING/CLOSING regions to clean up 
the RITs... (Tens of thousands on this big cluster) and then I run into the 
issue described here... where a bunch of unassigns end-up suspended never to be 
woken up.

A test would be sort of tough given the state is not normal.

Thanks [~allan163]


was (Author: stack):
Scenario is this:

 * meta has regions that reference a regionserver that is long gone. It was 
processed (or not if all MasterProcWALs have been removed) many restarts ago.
 * The table is borked. Some regions are not unassigned though their table is.
 * We run a mass unassign. Because table has many unassigned regions, it takes 
a while.
 * The first unassign queues a SCP for the long-dead server. It quickly runs 
through the SCP and finishes.. no logs to split.
 * Soon after, another scheduled unassign for same server is run. It queues an 
SCP (remember, if the unassign is against a server that is not online, we queue 
SCP and then wait on the SCP to wake the unassign so we do proper unassign 
cleanup in the handleRIT callback)... only in this case, the server is in the 
deadserver list and has been processed so this last assign just hangs for 
ever because the check for server state creates a new serverstatenode and new 
serverstatenodes default ONLINE.

It is sort of wonky and not 'usual' but I've been trashing my cluster and then 
trying to repair with hbck2. This is how I run into the odd state reported 
above.

A test would be sort of tough given the state is not normal.

Thanks [~allan163]

> [amv2] Revived deadservers; recreated serverstatenode
> -
>
> Key: HBASE-21259
> URL: https://issues.apache.org/jira/browse/HBASE-21259
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.1.0
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.2.0, 2.1.1, 2.0.3
>
>
> On startup, I see servers being revived; i.e. their serverstatenode is 
> getting marked online even though its just been processed by 
> ServerCrashProcedure. It looks like this (in a patched server that reports on 
> whenever a serverstatenode is created):
> {code}
> 2018-09-29 03:45:40,963 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=3982597, 
> state=SUCCESS; ServerCrashProcedure 
> server=vb1442.halxg.cloudera.com,22101,1536675314426, splitWal=true, 
> meta=false in 1.0130sec
> ...
> 2018-09-29 03:45:43,733 INFO 
> org.apache.hadoop.hbase.master.assignment.RegionStates: CREATING! 
> vb1442.halxg.cloudera.com,22101,1536675314426
> java.lang.RuntimeException: WHERE AM I?
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStates.getOrCreateServer(RegionStates.java:1116)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStates.addRegionToServer(RegionStates.java:1143)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1464)
> at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:200)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:369)
> at 
> 

[jira] [Commented] (HBASE-21259) [amv2] Revived deadservers; recreated serverstatenode

2018-10-01 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634569#comment-16634569
 ] 

stack commented on HBASE-21259:
---

Scenario is this:

 * meta has regions that reference a regionserver that is long gone. It was 
processed (or not if all MasterProcWALs have been removed) many restarts ago.
 * The table is borked. Some regions are not unassigned though their table is.
 * We run a mass unassign. Because table has many unassigned regions, it takes 
a while.
 * The first unassign queues a SCP for the long-dead server. It quickly runs 
through the SCP and finishes.. no logs to split.
 * Soon after, another scheduled unassign for same server is run. It queues an 
SCP (remember, if the unassign is against a server that is not online, we queue 
SCP and then wait on the SCP to wake the unassign so we do proper unassign 
cleanup in the handleRIT callback)... only in this case, the server is in the 
deadserver list and has been processed so this last assign just hangs for 
ever because the check for server state creates a new serverstatenode and new 
serverstatenodes default ONLINE.

It is sort of wonky and not 'usual' but I've been trashing my cluster and then 
trying to repair with hbck2. This is how I run into the odd state reported 
above.

A test would be sort of tough given the state is not normal.

Thanks [~allan163]

> [amv2] Revived deadservers; recreated serverstatenode
> -
>
> Key: HBASE-21259
> URL: https://issues.apache.org/jira/browse/HBASE-21259
> Project: HBase
>  Issue Type: Bug
>  Components: amv2
>Affects Versions: 2.1.0
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.2.0, 2.1.1, 2.0.3
>
>
> On startup, I see servers being revived; i.e. their serverstatenode is 
> getting marked online even though its just been processed by 
> ServerCrashProcedure. It looks like this (in a patched server that reports on 
> whenever a serverstatenode is created):
> {code}
> 2018-09-29 03:45:40,963 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=3982597, 
> state=SUCCESS; ServerCrashProcedure 
> server=vb1442.halxg.cloudera.com,22101,1536675314426, splitWal=true, 
> meta=false in 1.0130sec
> ...
> 2018-09-29 03:45:43,733 INFO 
> org.apache.hadoop.hbase.master.assignment.RegionStates: CREATING! 
> vb1442.halxg.cloudera.com,22101,1536675314426
> java.lang.RuntimeException: WHERE AM I?
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStates.getOrCreateServer(RegionStates.java:1116)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionStates.addRegionToServer(RegionStates.java:1143)
> at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1464)
> at 
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:200)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:369)
> at 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97)
> at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:953)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1716)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1494)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2022)
> {code}
> See how we've just finished a SCP which will have removed the 
> serverstatenode... but then we come across an unassign that references the 
> server that was just processed. The unassign will attempt to update the 
> serverstatenode and therein we create one if one not present. We shouldn't be 
> creating one.
> I think I see this a lot because I am scheduling unassigns with hbck2. The 
> servers crash and then come up with SCPs doing cleanup of old server and 
> unassign procedures in the procedure executor queue to be processed still 
>  but could happen at any time on cluster should an unassign happen get 
> scheduled near an SCP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21262) [hbck2] AMv2 Lock Picker

2018-10-01 Thread stack (JIRA)
stack created HBASE-21262:
-

 Summary: [hbck2] AMv2 Lock Picker
 Key: HBASE-21262
 URL: https://issues.apache.org/jira/browse/HBASE-21262
 Project: HBase
  Issue Type: Sub-task
  Components: hbck2, Operability
Reporter: stack
Assignee: stack
 Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3


This issue is about adding a lock picker to the HbckService

Over the w/e I had interesting case where an enable failed -- a subprocedure 
ran into an exclusive lock (I think) -- and then the parent enabletabled tried 
rollback. The rollback threw CODE-BUG because some subprocedures were in 
unrollbackable states so we ended up skipping out of the enable table 
procedure. The enable table procedure was marked ROLLBACKED... so it got GC'd. 
But the exclusive lock it had on the table stayed in place.

The above has to be fixed but for the future, we need way to kill locks 
otherwise only alternative if removing master proc wal files -- which is a 
bigger pain restoring good state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21242) [amv2] Miscellaneous minor log and assign procedure create improvements

2018-10-01 Thread Mike Drob (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634272#comment-16634272
 ] 

Mike Drob commented on HBASE-21242:
---

bq. A proc toString includes procId
Ah, ok, we're fine then. +1

> [amv2] Miscellaneous minor log and assign procedure create improvements
> ---
>
> Key: HBASE-21242
> URL: https://issues.apache.org/jira/browse/HBASE-21242
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, Operability
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21242.branch-2.1.001.patch, 
> HBASE-21242.branch-2.1.001.patch, HBASE-21242.branch-2.1.001.patch, 
> HBASE-21242.branch-2.1.002.patch
>
>
> Some minor fixups:
> {code}
> For RIT Duration, do better than print ms/seconds. Remove redundant UI
> column dedicated to duration when we log it in the status field too.
> Make bypass log at INFO level -- when DEBUG we can miss important
> fixup detail like why we failed.
> Make it so on complete of subprocedure, we note count of outstanding
> siblings so we have a clue how much further the parent has to go before
> it is done (Helpful when hundreds of servers doing SCP).
> Have the SCP run the AP preflight check before creating an AP; saves
> creation of hundreds of thousands of APs during fixup of this big cluster
> of mine.
> Don't log tablename three times when reporting remote call failed.
> If lock is held already, note who has it. Also log after we get lock
> or if we have to wait rather than log on entrance though we may
> later have to wait (or we may have just picked up the lock).
> {code}
> Posting patch in a sec but let me try it on cluster too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21245) Add exponential backoff when retrying for sync replication related procedures

2018-10-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633895#comment-16633895
 ] 

Hudson commented on HBASE-21245:


Results for branch master
[build #520 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/520/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/520//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/520//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/520//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Add exponential backoff when retrying for sync replication related procedures
> -
>
> Key: HBASE-21245
> URL: https://issues.apache.org/jira/browse/HBASE-21245
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Duo Zhang
>Assignee: Guanghao Zhang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21245.master.001.patch, 
> HBASE-21245.master.002.patch, HBASE-21245.master.003.patch, 
> HBASE-21245.master.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21207) Add client side sorting functionality in master web UI for table and region server details.

2018-10-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633859#comment-16633859
 ] 

Hudson commented on HBASE-21207:


Results for branch branch-1
[build #485 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/485/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/485//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/485//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/485//JDK8_Nightly_Build_Report_(Hadoop2)/]




(x) {color:red}-1 source release artifact{color}
-- See build output for details.


> Add client side sorting functionality in master web UI for table and region 
> server details.
> ---
>
> Key: HBASE-21207
> URL: https://issues.apache.org/jira/browse/HBASE-21207
> Project: HBase
>  Issue Type: Improvement
>  Components: master, monitoring, UI, Usability
>Reporter: Archana Katiyar
>Assignee: Archana Katiyar
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8
>
> Attachments: 14926e82-b929-11e8-8bdd-4ce4621f1118.png, 
> 21207.branch-1.addendum.patch, 2724afd8-b929-11e8-8171-8b5b2ba3084e.png, 
> HBASE-21207-branch-1.patch, HBASE-21207-branch-1.v1.patch, 
> HBASE-21207-branch-2.v1.patch, HBASE-21207.patch, HBASE-21207.patch, 
> HBASE-21207.v1.patch, edc5c812-b928-11e8-87e2-ce6396629bbc.png
>
>
> In Master UI, we can see region server details like requests per seconds and 
> number of regions etc. Similarly, for tables also we can see online regions , 
> offline regions.
> It will help ops people in determining hot spotting if we can provide sort 
> functionality in the UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21207) Add client side sorting functionality in master web UI for table and region server details.

2018-10-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633831#comment-16633831
 ] 

Hudson commented on HBASE-21207:


Results for branch branch-1.4
[build #487 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/487/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/487//General_Nightly_Build_Report/]


(x) {color:red}-1 jdk7 checks{color}
-- For more information [see jdk7 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/487//JDK7_Nightly_Build_Report/]


(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/487//JDK8_Nightly_Build_Report_(Hadoop2)/]




(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Add client side sorting functionality in master web UI for table and region 
> server details.
> ---
>
> Key: HBASE-21207
> URL: https://issues.apache.org/jira/browse/HBASE-21207
> Project: HBase
>  Issue Type: Improvement
>  Components: master, monitoring, UI, Usability
>Reporter: Archana Katiyar
>Assignee: Archana Katiyar
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.2.0, 1.4.8
>
> Attachments: 14926e82-b929-11e8-8bdd-4ce4621f1118.png, 
> 21207.branch-1.addendum.patch, 2724afd8-b929-11e8-8171-8b5b2ba3084e.png, 
> HBASE-21207-branch-1.patch, HBASE-21207-branch-1.v1.patch, 
> HBASE-21207-branch-2.v1.patch, HBASE-21207.patch, HBASE-21207.patch, 
> HBASE-21207.v1.patch, edc5c812-b928-11e8-87e2-ce6396629bbc.png
>
>
> In Master UI, we can see region server details like requests per seconds and 
> number of regions etc. Similarly, for tables also we can see online regions , 
> offline regions.
> It will help ops people in determining hot spotting if we can provide sort 
> functionality in the UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21250) Refactor WALProcedureStore and add more comments for better understanding the implementation

2018-10-01 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633819#comment-16633819
 ] 

Hadoop QA commented on HBASE-21250:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
29s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
27s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
28s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} hbase-procedure: The patch generated 0 new + 25 
unchanged - 21 fixed = 25 total (was 46) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} The patch passed checkstyle in hbase-server {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
44s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
12m 14s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
5s{color} | {color:green} hbase-procedure in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}138m  
1s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}189m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21250 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941944/HBASE-21250-v2.patch |
| Optional Tests |  dupname  asflicense  javac  javadoc  unit  findbugs  
shadedjars  hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux a5bf245c46b1 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 

[jira] [Commented] (HBASE-21250) Refactor WALProcedureStore and add more comments for better understanding the implementation

2018-10-01 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633649#comment-16633649
 ] 

Duo Zhang commented on HBASE-21250:
---

There are already plenty of comments for the WALProcedureMap and [~stack] has 
already played with it in the past so I think it is fine for now. Let's finish 
this issue and I will try to fix the fencing, i.e, recover lease problem, and 
also HBASE-21254.

Ping [~allan163] [~stack] for reviewing.

Thanks.

> Refactor WALProcedureStore and add more comments for better understanding the 
> implementation
> 
>
> Key: HBASE-21250
> URL: https://issues.apache.org/jira/browse/HBASE-21250
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21250-v1.patch, HBASE-21250-v2.patch, 
> HBASE-21250.patch
>
>
> The implementation is complicated and lack of comments to say how it works.
> {code}
> /**
>  * WAL implementation of the ProcedureStore.
>  * @see ProcedureWALPrettyPrinter for printing content of a single WAL.
>  * @see #main(String[]) to parse a directory of MasterWALProcs.
>  */
> {code}
> I think at least we can move sub classes to separated files to make the class 
> smaller, and add more comments to describe what is going on here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21250) Refactor WALProcedureStore and add more comments for better understanding the implementation

2018-10-01 Thread Duo Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21250:
--
Attachment: HBASE-21250-v2.patch

> Refactor WALProcedureStore and add more comments for better understanding the 
> implementation
> 
>
> Key: HBASE-21250
> URL: https://issues.apache.org/jira/browse/HBASE-21250
> Project: HBase
>  Issue Type: Sub-task
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21250-v1.patch, HBASE-21250-v2.patch, 
> HBASE-21250.patch
>
>
> The implementation is complicated and lack of comments to say how it works.
> {code}
> /**
>  * WAL implementation of the ProcedureStore.
>  * @see ProcedureWALPrettyPrinter for printing content of a single WAL.
>  * @see #main(String[]) to parse a directory of MasterWALProcs.
>  */
> {code}
> I think at least we can move sub classes to separated files to make the class 
> smaller, and add more comments to describe what is going on here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)