[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594362#comment-16594362 ] Mingliang Liu commented on HBASE-20642: --- Thanks [~an...@apache.org]. After discussion with [~apurtell] on related work HBASE-20408, I think this will fix branch-1 client unexpected exception as well. I can help review the patch for branch-1. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 3.0.0, 2.1.0, 2.0.2 > > Attachments: HBASE-20642.001.patch, HBASE-20642.002.patch, > HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594353#comment-16594353 ] Ankit Singhal commented on HBASE-20642: --- bq. Does it make sense to branch-1 as well? Thanks. [~liuml07] , [~elserj] , Yes, actually nonce changes are applicable as the handling of retry with nonce is also not correct in branch-1. Do you want me to backport it for branch-1? > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 3.0.0, 2.1.0, 2.0.2 > > Attachments: HBASE-20642.001.patch, HBASE-20642.002.patch, > HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594269#comment-16594269 ] Josh Elser commented on HBASE-20642: {quote}Does it make sense to branch-1 as well? Thanks. {quote} My thought was that this fix was predicated on some hbase-2.x nonce changes, but I honestly don't remember anymore. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 3.0.0, 2.1.0, 2.0.2 > > Attachments: HBASE-20642.001.patch, HBASE-20642.002.patch, > HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16592168#comment-16592168 ] Mingliang Liu commented on HBASE-20642: --- Does it make sense to branch-1 as well? Thanks. [~elserj] [~an...@apache.org] > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 3.0.0, 2.1.0, 2.0.2 > > Attachments: HBASE-20642.001.patch, HBASE-20642.002.patch, > HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519350#comment-16519350 ] Hudson commented on HBASE-20642: Results for branch branch-2.0 [build #453 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/453/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/453//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/453//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/453//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 3.0.0, 2.1.0, 2.0.2 > > Attachments: HBASE-20642.001.patch, HBASE-20642.002.patch, > HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518938#comment-16518938 ] Hudson commented on HBASE-20642: Results for branch branch-2 [build #886 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/886/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/886//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/886//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/886//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 3.0.0, 2.1.0, 2.0.2 > > Attachments: HBASE-20642.001.patch, HBASE-20642.002.patch, > HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518855#comment-16518855 ] stack commented on HBASE-20642: --- +1 for 2.0.2 > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 3.0.0, 2.1.0 > > Attachments: HBASE-20642.001.patch, HBASE-20642.002.patch, > HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518852#comment-16518852 ] Hudson commented on HBASE-20642: Results for branch master [build #371 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/371/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/371//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/371//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/371//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 3.0.0, 2.1.0 > > Attachments: HBASE-20642.001.patch, HBASE-20642.002.patch, > HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518646#comment-16518646 ] Josh Elser commented on HBASE-20642: [~stack], just to make sure: you'd like this for branch-2.0? > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 3.0.0, 2.1.0 > > Attachments: HBASE-20642.001.patch, HBASE-20642.002.patch, > HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518635#comment-16518635 ] Josh Elser commented on HBASE-20642: +1 from me too. Let me help by merging. Thanks for helping in reviews, [~stack]. Thanks for the great work, Ankit. I'll fix the checkstyle on commit. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.001.patch, HBASE-20642.002.patch, > HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517917#comment-16517917 ] Hadoop QA commented on HBASE-20642: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 3m 43s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 1s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 14s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 57s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 37s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 59s{color} | {color:red} hbase-server: The patch generated 1 new + 159 unchanged - 14 fixed = 160 total (was 173) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 4s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 54s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 6s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}152m 32s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}200m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-20642 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12928434/HBASE-20642.002.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 317c81d12dc5 4.4.0-104-generic #127-Ubuntu SMP Mon Dec 11 12:16:42 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 9101fc246f | | maven | version: Apache Maven 3.5.3 (3383c37e1f9e9b3bc3df5050c29c8aff9f295297;
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517820#comment-16517820 ] stack commented on HBASE-20642: --- Skimmed. +1 > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.001.patch, HBASE-20642.002.patch, > HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517810#comment-16517810 ] Ankit Singhal commented on HBASE-20642: --- Thanks [~stack] and [~elserj] for all the help in getting this fixed. {quote}Doing nonce creation outside the call method is what makes it so we don't make a new nonce on each invocation? {quote} Yes , so that call method will be called with the same nonce for retries. {quote}So, the patch here makes it so call could complete even though Master was restarted while call was going on (as long as new Master came up before timeout)? {quote} Yes , the test restarts procedure executor after every step to simulate the master failure. {quote}Bit of doc for this... public interface StepHook{ ? {quote} Done in the latest patch. Fixed javac, whitespace and checkstyle warning as well. {quote}I learned/re-learned stuff reviewing this work. {quote} Same here, great learnings for me as well. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.001.patch, HBASE-20642.002.patch, > HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517787#comment-16517787 ] Josh Elser commented on HBASE-20642: {quote}Doing nonce creation outside the call method is what makes it so we don't make a new nonce on each invocation? {quote} {quote}So, the patch here makes it so call could complete even though Master was restarted while call was going on (as long as new Master came up before timeout)? {quote} That's what I'm seeing with this. I think after the javadoc for StepHook (and the corresponding \{{MasterProcedureTestingUtility}} usage), this is good to go, too. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.001.patch, HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516676#comment-16516676 ] stack commented on HBASE-20642: --- Doing nonce creation outside the call method is what makes it so we don't make a new nonce on each invocation? 345 Long nonceGroup = ng.getNonceGroup(); 346 Long nonce = ng.newNonce(); If so thank you for fixing a bunch of mangled calls. Bit of doc for this... public interface StepHook{ ? So, the patch here makes it so call could complete even though Master was restarted while call was going on (as long as new Master came up before timeout)? Patch looks good to me [~an...@apache.org] Nice work sir. I learned/re-learned stuff reviewing this work. Thanks. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.001.patch, HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516281#comment-16516281 ] Hadoop QA commented on HBASE-20642: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 48s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 28s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 46s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 53s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 57s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 24s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 47s{color} | {color:red} hbase-server generated 1 new + 187 unchanged - 1 fixed = 188 total (was 188) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 12s{color} | {color:red} hbase-server: The patch generated 4 new + 159 unchanged - 14 fixed = 163 total (was 173) {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 5 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 58s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 21s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 9s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}114m 36s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 43s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}165m 59s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-20642 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12928233/HBASE-20642.001.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 74f6b8e47935 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516023#comment-16516023 ] Ankit Singhal commented on HBASE-20642: --- [~stack],[~elserj], [~mdrob] , attaching a patch for your review guys. * Added a test case which re-produces the issue * Fixed client side nonce generation for retry calls * Moved pre checks on the server under nonce checks. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.001.patch, HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508792#comment-16508792 ] Josh Elser commented on HBASE-20642: {quote}if we fix HBASE-20658 by releasing the latch after some pre-checks for Modify and Truncate table , then probably we may not need to do nonce check as retry mechanism will not kick in if the procedure is submitted successfully. {quote} I think in both cases, we don't want to release those latches early (would break functionality of Truncate, and regress on HBASE-19953 for modify table). However, using the same nonce from the client seems like the right fix to me. We have _one_ client and this client is (unwittingly) submitting the same procedure multiple times. I don't think I've seen any reason that this should be using a new nonce – so let's make the nonce help us in this known-deficient case :) > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497129#comment-16497129 ] stack commented on HBASE-20642: --- bq. Thanks stack, What do you say for HBASE-20658? Seems like latch is needed for whole call... (see [~Apache9] note). Thanks for keeping at this one. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497023#comment-16497023 ] Ankit Singhal commented on HBASE-20642: --- {quote} This is a general problem with the synchronous calls; they are not built to migrate across server failure. Currently there is no connection between running procedure and client invocation other than the stalled call. Perhaps we could build some sort of tether but the thinking was that we'd move off these old-style (deprecated) synchronous calls to instead use async where we do have a connection between the invocation and the running procedure via the returned future. {quote} Current implementation of synchronous calls in HBase simulates the way the client will handle the async calls, Like waiting for the future to return the results. so except the Modify and Truncate table procedure, the current mechanism is good, like we submit the procedure and checking periodically for the procedure to complete in separate calls which can handle the migration of master as well. {quote} Not to complete. The latch covers setup of the procedure only (A quote from HBASE-19953 suggests doc to make it clear that "the latch is just-for the Procedure preparation – that we are not blocking for the whole procedure run...") {quote} Yes, but in case of Modify and Truncate table procedure only, a latch is released at the end of the procedure. Raised HBASE-20658 for that. {quote}Yeah, this is a problem (why have nonce's if client is doing this...). Does this break your suggested solution here? Or rather, it needs client changes too? {quote} We may not need client change if we fix {quote}Re-reading the description, how would ensuring nonce-respect help? We'll not resubmit the procedure but neither will we recognize its successful completion since it happens on the new master, not the old. {quote} In case of synchronous calls as well , we check for procedure completion by requesting the server for the procedure results periodically, so the call will get to know if a new master has completed the procedure. Procedure is getting resubmitted in case of Modify and Truncate table procedure because of HBASE-20658. Just to summarize, if we fix HBASE-20658 by releasing the latch after some pre-checks for Modify and Truncate table , then probably we may not need to do nonce check as retry mechanism will not kick in if the procedure is submitted successfully. Thanks [~stack], What do you say for HBASE-20658? > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16495357#comment-16495357 ] stack commented on HBASE-20642: --- Re-reading the description, how would ensuring nonce-respect help? We'll not resubmit the procedure but neither will we recognize its successful completion since it happens on the new master, not the old. Thanks. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16495259#comment-16495259 ] stack commented on HBASE-20642: --- bq. we are already rebuilding in-memory nonce map from the persisted noncekey. so the retry should also get identified on the new master and ignored. You are right. I forgot about this aspect. Updated our rough doc to talk about Nonce persistence https://docs.google.com/document/d/1QLXlVERKt5EMbx_EL3Y2u0j64FN-_TrVoM5WWxIXh6o/edit# (I need to wind it into our refguide). bq. Though there is different problem with the client which is generating new noncekey for every retry. Yeah, this is a problem (why have nonce's if client is doing this...). Does this break your suggested solution here? Or rather, it needs client changes too? bq. We may not catch it If the master is killed, nevertheless user may not expect the exception if the procedure is completed successfully at the new master. Yes. This is a general problem with the synchronous calls; they are not built to migrate across server failure. Currently there is no connection between running procedure and client invocation other than the stalled call. Perhaps we could build some sort of tether but the thinking was that we'd move off these old-style (deprecated) synchronous calls to instead use async where we do have a connection between the invocation and the running procedure via the returned future. Meantime we have this half-way situation where the client synchronous API fails but behind the scenes it may prevail. The circumstance should be rare in practice but yeah, what to do so we don't surprise the operator. bq. It seems after HBASE-19953, Asynchronous call for DDLs also wait for the procedure to complete( as countDown() will happen when the procedure is completed) Not to complete. The latch covers setup of the procedure only (A quote from HBASE-19953 suggests doc to make it clear that "the latch is just-for the Procedure preparation – that we are not blocking for the whole procedure run...") > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494416#comment-16494416 ] Ankit Singhal commented on HBASE-20642: --- bq. That is not my understanding. The nonces are in an in-memory-only map in the Master process. They will not be migrated from one Master to the new one so, even if you put calls behind a nonce-check, it'll fail since the nonce-map is empty on new Master. I just checked the code and see that while loading procedure(ProcedureExecutor#loadProcedures) from MasterProcWals during restart, we are already rebuilding in-memory nonce map from the persisted noncekey. so the retry should also get identified on the new master and ignored. Though there is different problem with the client which is generating new noncekey for every retry. bq. Because the Master is failing which broke the synchronous wait on add column? Maybe add a check if master is going down and if it is throw that for an exception instead of doing this pre-flight check against current state of table descriptor? Would that be more meaningful? We may not catch it If the master is killed, nevertheless user may not expect the exception if the procedure is completed successfully at the new master. bq. It is pretty cool that the call keeps going though the Master has crashed... I think it is a bit much to expect that this call can pick up where it left off on the old Master though. It has no reference to the original transaction (it does not have a Future ). The retry call was moved to the new master and the new master during initialization will pick up procedure from the state at which it was persisted last by an old master in ProcWals and ignore the retry. so we should not have any problem if the operations are idempotent for a state. Will not this process happen when standby become active? bq. We want to move folks over to the async calls where they check to see if the Procedure is completed. It seems after HBASE-19953, Asynchronous call for DDLs also wait for the procedure to complete( as countDown() will happen when the procedure is completed). Thanks for bearing with me, I know there are too many questions. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491404#comment-16491404 ] stack commented on HBASE-20642: --- bq. If the master is swapped, nonces map will be rebuilt from uncompleted procedure during the replay so we should not have a problem checking on the new master as well. right? That is not my understanding. The nonces are in an in-memory-only map in the Master process. They will not be migrated from one Master to the new one so, even if you put calls behind a nonce-check, it'll fail since the nonce-map is empty on new Master. bq. Yes, they will get this on their first submission if the master goes down in between. Because the Master is failing which broke the synchronous wait on add column? Maybe add a check if master is going down and if it is throw that for an exception instead of doing this pre-flight check against current state of table descriptor? Would that be more meaningful? bq. This is addColumnFamily() synchronous call and it is getting moved to the new master. It is pretty cool that the call keeps going though the Master has crashed... I think it is a bit much to expect that this call can pick up where it left off on the old Master though. It has no reference to the original transaction (it does not have a Future ). We want to move folks over to the async calls where they check to see if the Procedure is completed. Thats the style we'd prefer. Meantime, I agree this exception message is confusing. Lets fix it (see above for suggestion). bq. No problem, probably I'm not putting the problem in right words stack Nah. I think its the receiving end that has the problem (smile). Thanks. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491398#comment-16491398 ] Ankit Singhal commented on HBASE-20642: --- bq.On their first submission, they get this? I don't follow. Is it something to do w/ Master going down? Yes, they will get this on their first submission if the master goes down in between. bq. What retrying mechanism is this? Is this the (deprecated) addColumn – synchronous call? Are you seeing the call move from the dead Master to the new Master? This is addColumnFamily() synchronous call and it is getting moved to the new master. bq. I like the idea of putting all behind Nonces but nonce are no good if the Master is swapped during the call? If the master is swapped, nonces map will be rebuilt from uncompleted procedure during the replay so we should not have a problem checking on the new master as well. right? {quote}Thanks for helping me understand.{quote} No problem, probably I'm not putting the problem in right words [~stack] > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491374#comment-16491374 ] stack commented on HBASE-20642: --- bq. , it's just user will get InvalidFamilyOperationException even for the first attempt only. On their first submission, they get this? I don't follow. Is it something to do w/ Master going down? bq. It's actually not the user, user is making a call only once but HBase client itself retries the call while master is restarting and if master come back in between and the procedure is completed, user will see InvalidFamilyOperationException because HBase consider it as a second call from the user although it is coming as part of retry by HBase client. What retrying mechanism is this? Is this the (deprecated) addColumn -- synchronous call? Are you seeing the call move from the dead Master to the new Master? I like the idea of putting all behind Nonces but nonce are no good if the Master is swapped during the call? Thanks for helping me understand. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491348#comment-16491348 ] Ankit Singhal commented on HBASE-20642: --- bq. The procedure was submitted, right, and started to make progress (it got as far as changing the table descriptor?). Did the procedure not succeed? Though there was a crash of Master in the middle of its running? If it did not complete, that is a problem. bq. Sounds like the original procedure did not complete? Is that so? That it died in the middle of its running and so you tried to resubmit the add column... but it fails because the original procedure died half-way through? Is this what is happening? No, The procedure will get succeed eventually after replaying procedure WALs, it's just user will get InvalidFamilyOperationException even for the first attempt only. bq.You mean, a user will retry because they think their original submission did not take? In this case, if a Procedure in-flight modifying the table, this second submission should fail. It's actually not the user, user is making a call only once but HBase client itself retries the call while master is restarting and if master come back in between and the procedure is completed, user will see InvalidFamilyOperationException because HBase consider it as a second call from the user although it is coming as part of retry by HBase client. So the patch is to move all the checks in Procedure so that we do nonce check to differentiate whether it is a retry or new call before actually executing them. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491340#comment-16491340 ] stack commented on HBASE-20642: --- bq. so as the call is not completed, The procedure was submitted, right, and started to make progress (it got as far as changing the table descriptor?). Did the procedure not succeed? Though there was a crash of Master in the middle of its running? If it did not complete, that is a problem. bq. HBase client will retry the call but it will fail with InvalidFamilyOperationException because we don't differentiate if it is retry or a new call. You mean, a user will retry because they think their original submission did not take? In this case, if a Procedure in-flight modifying the table, this second submission should fail. Sounds like the original procedure did not complete? Is that so? That it died in the middle of its running and so you tried to resubmit the add column... but it fails because the original procedure died half-way through? Is this what is happening? Thanks. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491337#comment-16491337 ] Ankit Singhal commented on HBASE-20642: --- Thanks [~mdrob] for taking a look. bq. I think the fix is correct, but I also think we need a unit test before we can commit this. bq. Take a look at ModifyTableProcedure::testRecoveryAndDoubleExecutionOnline bq. Need to do something similar, probably add another method in ProcedureTestingUtility similar to setKillBeforeStoreUpdate, but to kill at whatever point breaks this? Let me try to add some test. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491329#comment-16491329 ] Ankit Singhal commented on HBASE-20642: --- bq. I don't see what is wrong. You are trying to modify a table adding a column but you can't because a previous attempt succeeded? Actually, the scenario is, a user is trying to add a column family in the table but the master went down after completing half of the states in the procedure(let's assume the columnFamily was updated in tableDescriptor) , so as the call is not completed, HBase client will retry the call but it will fail with InvalidFamilyOperationException because we don't differentiate if it is retry or a new call. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491313#comment-16491313 ] stack commented on HBASE-20642: --- I don't see what is wrong. You are trying to modify a table adding a column but you can't because a previous attempt succeeded? Thanks [~an...@apache.org] > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491192#comment-16491192 ] Mike Drob commented on HBASE-20642: --- I think the fix is correct, but I also think we need a unit test before we can commit this. Take a look at ModifyTableProcedure::testRecoveryAndDoubleExecutionOnline Need to do something similar, probably add another method in ProcedureTestingUtility similar to setKillBeforeStoreUpdate, but to kill at whatever point breaks this? > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491181#comment-16491181 ] Josh Elser commented on HBASE-20642: Given my understanding, I'd say +1 [~stack], [~uagashe], [~mdrob], any of you folks want to take a look? > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489891#comment-16489891 ] Hadoop QA commented on HBASE-20642: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 13s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 57s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 7s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 2m 20s{color} | {color:blue} hbase-server in master has 2 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 49s{color} | {color:red} hbase-server generated 1 new + 187 unchanged - 1 fixed = 188 total (was 188) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} hbase-server: The patch generated 0 new + 148 unchanged - 14 fixed = 148 total (was 162) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 51s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 15m 11s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}129m 10s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}177m 27s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f | | JIRA Issue | HBASE-20642 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924993/HBASE-20642.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux c70bd2282c4a 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 1792f541c6 | | maven | version: Apache Maven 3.5.3 (3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC3 | | javac | https://builds.apache.org/job/PreCommit-HBASE-Build/12952/artifact/patchprocess/diff-compile-javac-hbase-server.txt | | Test
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489628#comment-16489628 ] Josh Elser commented on HBASE-20642: Also, the fix makes sense to me (and reminds me of the last time I was in the procv2 code). The lack of a test is a smell, but I also don't have something in mind to suggest.. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489609#comment-16489609 ] Josh Elser commented on HBASE-20642: [~stack] ^ > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489510#comment-16489510 ] Ankit Singhal commented on HBASE-20642: --- @stack,[~elserj], [~tedyu], can you please review the attached patch. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Attachments: HBASE-20642.patch > > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20642) IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException
[ https://issues.apache.org/jira/browse/HBASE-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489498#comment-16489498 ] Ankit Singhal commented on HBASE-20642: --- I was analyzing the logs provided:- Client tried to add column family "cf-0544745230" in "ittable-0455209020" client logs: {code} 2018-05-15 02:54:20,789|INFO|MainThread|machine.py:167 - run()||GUID=0022cef5-fb09-4e5e-bfad-5f239adfb691|2018-05-15 02:54:20,786 INFO [Thread-10] hbase.IntegrationTestDDLMasterFailover: Adding column family: {NAME => 'cf-0544745230', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} to table: ittable-0455209020 {code} But master executing the procedure got restarted but procedure has already updated the tableinfo in hdfs Master which is about to got down:- {code} 2018-05-15 02:54:21,862 INFO [PEWorker-8] assignment.RegionTransitionProcedure: Dispatch pid=16618, ppid=16338, state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=ittable-0474715061, region=65e930848fdabc3fa93fc6c2ee8e9ca9, target=ctr-e138-1518143905142-311755-01-09.hwx.site,16020,1526352510710; rit=OPENING, location=ctr-e138-1518143905142-311755-01-09.hwx.site,16020,1526352510710 2018-05-15 02:54:25,908 INFO [main] master.HMaster: STARTING service HMaster 2018-05-15 02:54:20,790 INFO [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=2] master.HMaster: Client=hbase//172.27.24.220 modify ittable-0455209020 2018-05-15 02:54:21,849 INFO [PEWorker-2] util.FSTableDescriptors: Updated tableinfo=hdfs://ns1/apps/hbase/data/data/default/ittable-0455209020/.tabledesc/.tableinfo.03 {code} Though, standby master become active and executed the procedure from the state it was recorded in master procedure wals. standby Master log :- {code} 2018-05-15 02:54:27,465 INFO [master/ctr-e138-1518143905142-311755-01-03:2] master.ActiveMasterManager: Registered as active master=ctr-e138-1518143905142-311755-01-03.hwx.site,2,1526352691422 2018-05-15 02:55:14,413 INFO [PEWorker-15] procedure2.ProcedureExecutor: Finished pid=16754, state=SUCCESS; ModifyTableProcedure table=ittable-0455209020 in 53.5830sec {code} So now the retry to add ColumnFamily will fail because of the below check as our descriptor is already updated by both the masters. {code} @Override public long addColumn( final TableName tableName, final ColumnFamilyDescriptor column, final long nonceGroup, final long nonce) throws IOException { checkInitialized(); checkTableExists(tableName); TableDescriptor old = getTableDescriptors().get(tableName); if (old.hasColumnFamily(column.getName())) { throw new InvalidFamilyOperationException("Column family '" + column.getNameAsString() + "' in table '" + tableName + "' already exists so cannot be added"); } {code} Failure at the client:- {code} org.apache.hadoop.hbase.InvalidFamilyOperationException: org.apache.hadoop.hbase.InvalidFamilyOperationException: Column family 'cf-0544745230' in table 'ittable-0455209020' already exists so cannot be added E at org.apache.hadoop.hbase.master.HMaster.addColumn(HMaster.java:2158) {code} So the solution would be to pass every step/checks after nonce check in procedure execution to avoid failures during retries. Attaching a tentative fix. > IntegrationTestDDLMasterFailover throws 'InvalidFamilyOperationException > - > > Key: HBASE-20642 > URL: https://issues.apache.org/jira/browse/HBASE-20642 > Project: HBase > Issue Type: Bug >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > > [~romil.choksi] reported that IntegrationTestDDLMasterFailover is failing > while adding column family during the time master is restarting. -- This message was sent by Atlassian JIRA (v7.6.3#76005)