[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16952464#comment-16952464 ] HBase QA commented on HBASE-12125: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red} HBASE-12125 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/in-progress/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-12125 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12895999/HBASE-12125.v4.master.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/957/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.11.0 https://yetus.apache.org | This message was automatically generated. > Add Hbck option to check and fix WAL's from replication queue > - > > Key: HBASE-12125 > URL: https://issues.apache.org/jira/browse/HBASE-12125 > Project: HBase > Issue Type: Bug > Components: hbck, hbck2, Replication >Affects Versions: 3.0.0, 2.3.0, 1.6.0, hbase-operator-tools-1.1.0 >Reporter: Virag Kothari >Assignee: Vincent Poon >Priority: Critical > Attachments: HBASE-12125.v1.master.patch, > HBASE-12125.v2.master.patch, HBASE-12125.v3.master.patch, > HBASE-12125.v4.master.patch > > > The replication source will discard the WAL file in many cases when it > encounters an exception reading it . This can cause data loss > and the underlying reason of failed read remains hidden. Only in certain > scenarios, the replication source should dump the current WAL and move to the > next one. > This JIRA aims to have an hbck option to check the WAL files of replication > queues for any inconsistencies and also provide an option to fix it. > The fix can be to remove the file from replication queue in zk and from the > memory of replication source manager and replication sources. > A region server endpoint call from the hbck client to region server can be > used to achieve this. > Hbck can be configured with the following options: > -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL > currently read by replication source) from replication queue. If there is a > position associated, it also seeks to that position and reads an entry from > there > -hardCheckReplicationWAL: Check all WAL paths from replication queues by > reading them completely to make sure they are ok. > -fixMissingReplicationWAL: Remove the WAL's from replication queues which are > not present on hdfs > -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which > are corrupted (based on the findings from softCheck/hardCheck). Also the > WAL's are moved to a quarantine dir > -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is > first rolled over and then deals with it in the same way as > -fixCorruptedReplicationWAL option -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271568#comment-16271568 ] Vincent Poon commented on HBASE-12125: -- This patch only adds replication fixes to hbck , which are independent of AMv2. So I think this part of hbck should still be valid. > Add Hbck option to check and fix WAL's from replication queue > - > > Key: HBASE-12125 > URL: https://issues.apache.org/jira/browse/HBASE-12125 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 3.0.0 >Reporter: Virag Kothari >Assignee: Vincent Poon > Attachments: HBASE-12125.v1.master.patch, > HBASE-12125.v2.master.patch, HBASE-12125.v3.master.patch, > HBASE-12125.v4.master.patch > > > The replication source will discard the WAL file in many cases when it > encounters an exception reading it . This can cause data loss > and the underlying reason of failed read remains hidden. Only in certain > scenarios, the replication source should dump the current WAL and move to the > next one. > This JIRA aims to have an hbck option to check the WAL files of replication > queues for any inconsistencies and also provide an option to fix it. > The fix can be to remove the file from replication queue in zk and from the > memory of replication source manager and replication sources. > A region server endpoint call from the hbck client to region server can be > used to achieve this. > Hbck can be configured with the following options: > -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL > currently read by replication source) from replication queue. If there is a > position associated, it also seeks to that position and reads an entry from > there > -hardCheckReplicationWAL: Check all WAL paths from replication queues by > reading them completely to make sure they are ok. > -fixMissingReplicationWAL: Remove the WAL's from replication queues which are > not present on hdfs > -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which > are corrupted (based on the findings from softCheck/hardCheck). Also the > WAL's are moved to a quarantine dir > -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is > first rolled over and then deals with it in the same way as > -fixCorruptedReplicationWAL option -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238756#comment-16238756 ] Hadoop QA commented on HBASE-12125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 2m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 33s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 28s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 4s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} hbase-replication: The patch generated 0 new + 8 unchanged - 4 fixed = 8 total (was 12) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 6s{color} | {color:green} hbase-server: The patch generated 0 new + 189 unchanged - 2 fixed = 189 total (was 191) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 41s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 47m 12s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 13s{color} | {color:green} hbase-replication in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}119m 2s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 42s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}191m 18s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-12125 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12895999/HBASE-12125.v4.master.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 5a1b808294bb 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238284#comment-16238284 ] stack commented on HBASE-12125: --- [~churromorales] HBCK presumes how assignment works. It also messes w/ hbase privates. In hbase2, assignment has been redone such that the Master's in-memory view is definitive -- no more state distributed over fs, zk, and master -- and Master effects any or all change. Also Master internals have changed. HBCK at a minimum no longer works and at worse, can actually do damage. TODO is an HBCK2. Shout if you need more detail sir. > Add Hbck option to check and fix WAL's from replication queue > - > > Key: HBASE-12125 > URL: https://issues.apache.org/jira/browse/HBASE-12125 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 3.0.0 >Reporter: Virag Kothari >Assignee: Vincent Poon >Priority: Major > Attachments: HBASE-12125.v1.master.patch, > HBASE-12125.v2.master.patch, HBASE-12125.v3.master.patch > > > The replication source will discard the WAL file in many cases when it > encounters an exception reading it . This can cause data loss > and the underlying reason of failed read remains hidden. Only in certain > scenarios, the replication source should dump the current WAL and move to the > next one. > This JIRA aims to have an hbck option to check the WAL files of replication > queues for any inconsistencies and also provide an option to fix it. > The fix can be to remove the file from replication queue in zk and from the > memory of replication source manager and replication sources. > A region server endpoint call from the hbck client to region server can be > used to achieve this. > Hbck can be configured with the following options: > -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL > currently read by replication source) from replication queue. If there is a > position associated, it also seeks to that position and reads an entry from > there > -hardCheckReplicationWAL: Check all WAL paths from replication queues by > reading them completely to make sure they are ok. > -fixMissingReplicationWAL: Remove the WAL's from replication queues which are > not present on hdfs > -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which > are corrupted (based on the findings from softCheck/hardCheck). Also the > WAL's are moved to a quarantine dir > -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is > first rolled over and then deals with it in the same way as > -fixCorruptedReplicationWAL option -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238272#comment-16238272 ] Andrew Purtell commented on HBASE-12125: bq. Is there a reason all the fsck tests are ignored after the ProcedureV2 patch went in? They were disabled for AMv2 actually. A lot of HBCK actions are not appropriate for AMv2. [~stack] can say more. > Add Hbck option to check and fix WAL's from replication queue > - > > Key: HBASE-12125 > URL: https://issues.apache.org/jira/browse/HBASE-12125 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 3.0.0 >Reporter: Virag Kothari >Assignee: Vincent Poon >Priority: Major > Attachments: HBASE-12125.v1.master.patch, > HBASE-12125.v2.master.patch, HBASE-12125.v3.master.patch > > > The replication source will discard the WAL file in many cases when it > encounters an exception reading it . This can cause data loss > and the underlying reason of failed read remains hidden. Only in certain > scenarios, the replication source should dump the current WAL and move to the > next one. > This JIRA aims to have an hbck option to check the WAL files of replication > queues for any inconsistencies and also provide an option to fix it. > The fix can be to remove the file from replication queue in zk and from the > memory of replication source manager and replication sources. > A region server endpoint call from the hbck client to region server can be > used to achieve this. > Hbck can be configured with the following options: > -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL > currently read by replication source) from replication queue. If there is a > position associated, it also seeks to that position and reads an entry from > there > -hardCheckReplicationWAL: Check all WAL paths from replication queues by > reading them completely to make sure they are ok. > -fixMissingReplicationWAL: Remove the WAL's from replication queues which are > not present on hdfs > -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which > are corrupted (based on the findings from softCheck/hardCheck). Also the > WAL's are moved to a quarantine dir > -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is > first rolled over and then deals with it in the same way as > -fixCorruptedReplicationWAL option -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238224#comment-16238224 ] churro morales commented on HBASE-12125: Is there a reason all the fsck tests are ignored after the ProcedureV2 patch went in? > Add Hbck option to check and fix WAL's from replication queue > - > > Key: HBASE-12125 > URL: https://issues.apache.org/jira/browse/HBASE-12125 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 3.0.0 >Reporter: Virag Kothari >Assignee: Vincent Poon >Priority: Major > Attachments: HBASE-12125.v1.master.patch, > HBASE-12125.v2.master.patch, HBASE-12125.v3.master.patch > > > The replication source will discard the WAL file in many cases when it > encounters an exception reading it . This can cause data loss > and the underlying reason of failed read remains hidden. Only in certain > scenarios, the replication source should dump the current WAL and move to the > next one. > This JIRA aims to have an hbck option to check the WAL files of replication > queues for any inconsistencies and also provide an option to fix it. > The fix can be to remove the file from replication queue in zk and from the > memory of replication source manager and replication sources. > A region server endpoint call from the hbck client to region server can be > used to achieve this. > Hbck can be configured with the following options: > -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL > currently read by replication source) from replication queue. If there is a > position associated, it also seeks to that position and reads an entry from > there > -hardCheckReplicationWAL: Check all WAL paths from replication queues by > reading them completely to make sure they are ok. > -fixMissingReplicationWAL: Remove the WAL's from replication queues which are > not present on hdfs > -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which > are corrupted (based on the findings from softCheck/hardCheck). Also the > WAL's are moved to a quarantine dir > -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is > first rolled over and then deals with it in the same way as > -fixCorruptedReplicationWAL option -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236993#comment-16236993 ] Ted Yu commented on HBASE-12125: >From the QA run: {code} [ERROR] testFixMissingReplicationWAL(org.apache.hadoop.hbase.util.TestHBaseFsckReplication) Time elapsed: 54.85 s <<< ERROR! java.lang.NullPointerException at org.apache.hadoop.hbase.util.TestHBaseFsckReplication.testFixMissingReplicationWAL(TestHBaseFsckReplication.java:184) {code} which was almost identical to the error I reported yesterday. > Add Hbck option to check and fix WAL's from replication queue > - > > Key: HBASE-12125 > URL: https://issues.apache.org/jira/browse/HBASE-12125 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 3.0.0 >Reporter: Virag Kothari >Assignee: Vincent Poon >Priority: Major > Attachments: HBASE-12125.v1.master.patch, > HBASE-12125.v2.master.patch, HBASE-12125.v3.master.patch > > > The replication source will discard the WAL file in many cases when it > encounters an exception reading it . This can cause data loss > and the underlying reason of failed read remains hidden. Only in certain > scenarios, the replication source should dump the current WAL and move to the > next one. > This JIRA aims to have an hbck option to check the WAL files of replication > queues for any inconsistencies and also provide an option to fix it. > The fix can be to remove the file from replication queue in zk and from the > memory of replication source manager and replication sources. > A region server endpoint call from the hbck client to region server can be > used to achieve this. > Hbck can be configured with the following options: > -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL > currently read by replication source) from replication queue. If there is a > position associated, it also seeks to that position and reads an entry from > there > -hardCheckReplicationWAL: Check all WAL paths from replication queues by > reading them completely to make sure they are ok. > -fixMissingReplicationWAL: Remove the WAL's from replication queues which are > not present on hdfs > -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which > are corrupted (based on the findings from softCheck/hardCheck). Also the > WAL's are moved to a quarantine dir > -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is > first rolled over and then deals with it in the same way as > -fixCorruptedReplicationWAL option -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236990#comment-16236990 ] Hadoop QA commented on HBASE-12125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 9s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 25s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 44s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 12s{color} | {color:green} hbase-replication: The patch generated 0 new + 8 unchanged - 4 fixed = 8 total (was 12) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 11s{color} | {color:red} hbase-server: The patch generated 1 new + 189 unchanged - 2 fixed = 190 total (was 191) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 12s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 52m 45s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 14s{color} | {color:green} hbase-replication in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}126m 19s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}203m 2s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.util.TestHBaseFsckReplication | | | hadoop.hbase.security.access.TestCoprocessorWhitelistMasterObserver | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-12125 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12895502/HBASE-12125.v3.master.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 207597ce9dda 3.13.0-116-generic #163-Ubuntu SMP Fri Mar
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236859#comment-16236859 ] Vincent Poon commented on HBASE-12125: -- [~tedyu] I get the "java.lang.NoSuchMethodError: org.eclipse.jetty.server.session.SessionHandler.getSessionManager()" from HADOOP-14930 , how did you get around that? I used your command options and added "-Djetty.version=9.3.19.v20170502" and the test passed for me. Thanks! > Add Hbck option to check and fix WAL's from replication queue > - > > Key: HBASE-12125 > URL: https://issues.apache.org/jira/browse/HBASE-12125 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 3.0.0 >Reporter: Virag Kothari >Assignee: Vincent Poon >Priority: Major > Attachments: HBASE-12125.v1.master.patch, > HBASE-12125.v2.master.patch, HBASE-12125.v3.master.patch > > > The replication source will discard the WAL file in many cases when it > encounters an exception reading it . This can cause data loss > and the underlying reason of failed read remains hidden. Only in certain > scenarios, the replication source should dump the current WAL and move to the > next one. > This JIRA aims to have an hbck option to check the WAL files of replication > queues for any inconsistencies and also provide an option to fix it. > The fix can be to remove the file from replication queue in zk and from the > memory of replication source manager and replication sources. > A region server endpoint call from the hbck client to region server can be > used to achieve this. > Hbck can be configured with the following options: > -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL > currently read by replication source) from replication queue. If there is a > position associated, it also seeks to that position and reads an entry from > there > -hardCheckReplicationWAL: Check all WAL paths from replication queues by > reading them completely to make sure they are ok. > -fixMissingReplicationWAL: Remove the WAL's from replication queues which are > not present on hdfs > -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which > are corrupted (based on the findings from softCheck/hardCheck). Also the > WAL's are moved to a quarantine dir > -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is > first rolled over and then deals with it in the same way as > -fixCorruptedReplicationWAL option -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236662#comment-16236662 ] Ted Yu commented on HBASE-12125: I use the following command options: {code} -Phadoop-3.0 -Dhadoop-three.version=3.0.0-beta1 -Dhadoop-two.version=3.0.0-beta1 {code} > Add Hbck option to check and fix WAL's from replication queue > - > > Key: HBASE-12125 > URL: https://issues.apache.org/jira/browse/HBASE-12125 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 3.0.0 >Reporter: Virag Kothari >Assignee: Vincent Poon >Priority: Major > Attachments: HBASE-12125.v1.master.patch, > HBASE-12125.v2.master.patch, HBASE-12125.v3.master.patch > > > The replication source will discard the WAL file in many cases when it > encounters an exception reading it . This can cause data loss > and the underlying reason of failed read remains hidden. Only in certain > scenarios, the replication source should dump the current WAL and move to the > next one. > This JIRA aims to have an hbck option to check the WAL files of replication > queues for any inconsistencies and also provide an option to fix it. > The fix can be to remove the file from replication queue in zk and from the > memory of replication source manager and replication sources. > A region server endpoint call from the hbck client to region server can be > used to achieve this. > Hbck can be configured with the following options: > -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL > currently read by replication source) from replication queue. If there is a > position associated, it also seeks to that position and reads an entry from > there > -hardCheckReplicationWAL: Check all WAL paths from replication queues by > reading them completely to make sure they are ok. > -fixMissingReplicationWAL: Remove the WAL's from replication queues which are > not present on hdfs > -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which > are corrupted (based on the findings from softCheck/hardCheck). Also the > WAL's are moved to a quarantine dir > -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is > first rolled over and then deals with it in the same way as > -fixCorruptedReplicationWAL option -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236644#comment-16236644 ] Mike Drob commented on HBASE-12125: --- You can build against hadoop3 by specifying {{-Dhadoop.profile=3}} in you maven command line. I think that will get you 3.0.0-alpha4. You can also optionally specify {{-Dhadoop-three.version=3.0.0-beta1}} > Add Hbck option to check and fix WAL's from replication queue > - > > Key: HBASE-12125 > URL: https://issues.apache.org/jira/browse/HBASE-12125 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 3.0.0 >Reporter: Virag Kothari >Assignee: Vincent Poon >Priority: Major > Attachments: HBASE-12125.v1.master.patch, > HBASE-12125.v2.master.patch, HBASE-12125.v3.master.patch > > > The replication source will discard the WAL file in many cases when it > encounters an exception reading it . This can cause data loss > and the underlying reason of failed read remains hidden. Only in certain > scenarios, the replication source should dump the current WAL and move to the > next one. > This JIRA aims to have an hbck option to check the WAL files of replication > queues for any inconsistencies and also provide an option to fix it. > The fix can be to remove the file from replication queue in zk and from the > memory of replication source manager and replication sources. > A region server endpoint call from the hbck client to region server can be > used to achieve this. > Hbck can be configured with the following options: > -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL > currently read by replication source) from replication queue. If there is a > position associated, it also seeks to that position and reads an entry from > there > -hardCheckReplicationWAL: Check all WAL paths from replication queues by > reading them completely to make sure they are ok. > -fixMissingReplicationWAL: Remove the WAL's from replication queues which are > not present on hdfs > -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which > are corrupted (based on the findings from softCheck/hardCheck). Also the > WAL's are moved to a quarantine dir > -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is > first rolled over and then deals with it in the same way as > -fixCorruptedReplicationWAL option -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236634#comment-16236634 ] Vincent Poon commented on HBASE-12125: -- [~tedyu] How do I test against hadoop3 - do I just change pom.xml "" to "${hadoop-three.version}" ? That passed for me. > Add Hbck option to check and fix WAL's from replication queue > - > > Key: HBASE-12125 > URL: https://issues.apache.org/jira/browse/HBASE-12125 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 3.0.0 >Reporter: Virag Kothari >Assignee: Vincent Poon >Priority: Major > Attachments: HBASE-12125.v1.master.patch, HBASE-12125.v2.master.patch > > > The replication source will discard the WAL file in many cases when it > encounters an exception reading it . This can cause data loss > and the underlying reason of failed read remains hidden. Only in certain > scenarios, the replication source should dump the current WAL and move to the > next one. > This JIRA aims to have an hbck option to check the WAL files of replication > queues for any inconsistencies and also provide an option to fix it. > The fix can be to remove the file from replication queue in zk and from the > memory of replication source manager and replication sources. > A region server endpoint call from the hbck client to region server can be > used to achieve this. > Hbck can be configured with the following options: > -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL > currently read by replication source) from replication queue. If there is a > position associated, it also seeks to that position and reads an entry from > there > -hardCheckReplicationWAL: Check all WAL paths from replication queues by > reading them completely to make sure they are ok. > -fixMissingReplicationWAL: Remove the WAL's from replication queues which are > not present on hdfs > -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which > are corrupted (based on the findings from softCheck/hardCheck). Also the > WAL's are moved to a quarantine dir > -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is > first rolled over and then deals with it in the same way as > -fixCorruptedReplicationWAL option -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234982#comment-16234982 ] Ted Yu commented on HBASE-12125: Running new test against hadoop3 beta1, I got: {code} testFixMissingReplicationWAL(org.apache.hadoop.hbase.util.TestHBaseFsckReplication) Time elapsed: 49.211 sec <<< ERROR! java.lang.NullPointerException at org.apache.hadoop.hbase.util.TestHBaseFsckReplication.testFixMissingReplicationWAL(TestHBaseFsckReplication.java:184) {code} See if the above can be reproduced on hadoop2 > Add Hbck option to check and fix WAL's from replication queue > - > > Key: HBASE-12125 > URL: https://issues.apache.org/jira/browse/HBASE-12125 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 3.0.0 >Reporter: Virag Kothari >Assignee: Vincent Poon >Priority: Major > Attachments: HBASE-12125.v1.master.patch, HBASE-12125.v2.master.patch > > > The replication source will discard the WAL file in many cases when it > encounters an exception reading it . This can cause data loss > and the underlying reason of failed read remains hidden. Only in certain > scenarios, the replication source should dump the current WAL and move to the > next one. > This JIRA aims to have an hbck option to check the WAL files of replication > queues for any inconsistencies and also provide an option to fix it. > The fix can be to remove the file from replication queue in zk and from the > memory of replication source manager and replication sources. > A region server endpoint call from the hbck client to region server can be > used to achieve this. > Hbck can be configured with the following options: > -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL > currently read by replication source) from replication queue. If there is a > position associated, it also seeks to that position and reads an entry from > there > -hardCheckReplicationWAL: Check all WAL paths from replication queues by > reading them completely to make sure they are ok. > -fixMissingReplicationWAL: Remove the WAL's from replication queues which are > not present on hdfs > -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which > are corrupted (based on the findings from softCheck/hardCheck). Also the > WAL's are moved to a quarantine dir > -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is > first rolled over and then deals with it in the same way as > -fixCorruptedReplicationWAL option -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233773#comment-16233773 ] Hadoop QA commented on HBASE-12125: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 4m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 26s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 25s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 5s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 14s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 10s{color} | {color:red} hbase-replication: The patch generated 5 new + 9 unchanged - 3 fixed = 14 total (was 12) {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 3s{color} | {color:red} hbase-server: The patch generated 23 new + 189 unchanged - 2 fixed = 212 total (was 191) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedjars {color} | {color:red} 2m 59s{color} | {color:red} patch has 10 errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 42m 27s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 15s{color} | {color:green} hbase-replication in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 88m 10s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 37s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}154m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-12125 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12895082/HBASE-12125.v1.master.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 9f0908b24d09 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227696#comment-16227696 ] Ted Yu commented on HBASE-12125: Can you put the patch on review board ? > Add Hbck option to check and fix WAL's from replication queue > - > > Key: HBASE-12125 > URL: https://issues.apache.org/jira/browse/HBASE-12125 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 3.0.0 >Reporter: Virag Kothari >Assignee: Vincent Poon > Attachments: HBASE-12125.v1.master.patch > > > The replication source will discard the WAL file in many cases when it > encounters an exception reading it . This can cause data loss > and the underlying reason of failed read remains hidden. Only in certain > scenarios, the replication source should dump the current WAL and move to the > next one. > This JIRA aims to have an hbck option to check the WAL files of replication > queues for any inconsistencies and also provide an option to fix it. > The fix can be to remove the file from replication queue in zk and from the > memory of replication source manager and replication sources. > A region server endpoint call from the hbck client to region server can be > used to achieve this. > Hbck can be configured with the following options: > -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL > currently read by replication source) from replication queue. If there is a > position associated, it also seeks to that position and reads an entry from > there > -hardCheckReplicationWAL: Check all WAL paths from replication queues by > reading them completely to make sure they are ok. > -fixMissingReplicationWAL: Remove the WAL's from replication queues which are > not present on hdfs > -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which > are corrupted (based on the findings from softCheck/hardCheck). Also the > WAL's are moved to a quarantine dir > -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is > first rolled over and then deals with it in the same way as > -fixCorruptedReplicationWAL option -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732053#comment-14732053 ] Andrew Purtell commented on HBASE-12125: For 0.98 you mean? If so, yes that branch is accepting improvements. > Add Hbck option to check and fix WAL's from replication queue > - > > Key: HBASE-12125 > URL: https://issues.apache.org/jira/browse/HBASE-12125 > Project: HBase > Issue Type: Bug > Components: Replication >Reporter: Virag Kothari >Assignee: Virag Kothari > > The replication source will discard the WAL file in many cases when it > encounters an exception reading it . This can cause data loss > and the underlying reason of failed read remains hidden. Only in certain > scenarios, the replication source should dump the current WAL and move to the > next one. > This JIRA aims to have an hbck option to check the WAL files of replication > queues for any inconsistencies and also provide an option to fix it. > The fix can be to remove the file from replication queue in zk and from the > memory of replication source manager and replication sources. > A region server endpoint call from the hbck client to region server can be > used to achieve this. > Hbck can be configured with the following options: > -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL > currently read by replication source) from replication queue. If there is a > position associated, it also seeks to that position and reads an entry from > there > -hardCheckReplicationWAL: Check all WAL paths from replication queues by > reading them completely to make sure they are ok. > -fixMissingReplicationWAL: Remove the WAL's from replication queues which are > not present on hdfs > -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which > are corrupted (based on the findings from softCheck/hardCheck). Also the > WAL's are moved to a quarantine dir > -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is > first rolled over and then deals with it in the same way as > -fixCorruptedReplicationWAL option -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731652#comment-14731652 ] Mikhail Antonov commented on HBASE-12125: - bq. I have a 0.98 patch for this. Will put for review. Apparently the code base have moved forward since then, but curious if this patch still would be relevant and useful here? > Add Hbck option to check and fix WAL's from replication queue > - > > Key: HBASE-12125 > URL: https://issues.apache.org/jira/browse/HBASE-12125 > Project: HBase > Issue Type: Bug > Components: Replication >Reporter: Virag Kothari >Assignee: Virag Kothari > > The replication source will discard the WAL file in many cases when it > encounters an exception reading it . This can cause data loss > and the underlying reason of failed read remains hidden. Only in certain > scenarios, the replication source should dump the current WAL and move to the > next one. > This JIRA aims to have an hbck option to check the WAL files of replication > queues for any inconsistencies and also provide an option to fix it. > The fix can be to remove the file from replication queue in zk and from the > memory of replication source manager and replication sources. > A region server endpoint call from the hbck client to region server can be > used to achieve this. > Hbck can be configured with the following options: > -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL > currently read by replication source) from replication queue. If there is a > position associated, it also seeks to that position and reads an entry from > there > -hardCheckReplicationWAL: Check all WAL paths from replication queues by > reading them completely to make sure they are ok. > -fixMissingReplicationWAL: Remove the WAL's from replication queues which are > not present on hdfs > -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which > are corrupted (based on the findings from softCheck/hardCheck). Also the > WAL's are moved to a quarantine dir > -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is > first rolled over and then deals with it in the same way as > -fixCorruptedReplicationWAL option -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154441#comment-14154441 ] Virag Kothari commented on HBASE-12125: --- A WAL roll on region server would be required only if the current WAL (WAL being written to) is corrupted. So fixCorruptedReplicationWAL can be useful if we know that the current WAL being written to is ok. Add Hbck option to check and fix WAL's from replication queue - Key: HBASE-12125 URL: https://issues.apache.org/jira/browse/HBASE-12125 Project: HBase Issue Type: Bug Components: Replication Reporter: Virag Kothari Assignee: Virag Kothari The replication source will discard the WAL file in many cases when it encounters an exception reading it . This can cause data loss and the underlying reason of failed read remains hidden. Only in certain scenarios, the replication source should dump the current WAL and move to the next one. This JIRA aims to have an hbck option to check the WAL files of replication queues for any inconsistencies and also provide an option to fix it. The fix can be to remove the file from replication queue in zk and from the memory of replication source manager and replication sources. A region server endpoint call from the hbck client to region server can be used to achieve this. Hbck can be configured with the following options: -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL currently read by replication source) from replication queue. If there is a position associated, it also seeks to that position and reads an entry from there -hardCheckReplicationWAL: Check all WAL paths from replication queues by reading them completely to make sure they are ok. -fixMissingReplicationWAL: Remove the WAL's from replication queues which are not present on hdfs -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which are corrupted (based on the findings from softCheck/hardCheck). Also the WAL's are moved to a quarantine dir -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is first rolled over and then deals with it in the same way as -fixCorruptedReplicationWAL option -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153884#comment-14153884 ] Virag Kothari commented on HBASE-12125: --- I have a 0.98 patch for this. Will put for review. Add Hbck option to check and fix WAL's from replication queue - Key: HBASE-12125 URL: https://issues.apache.org/jira/browse/HBASE-12125 Project: HBase Issue Type: Bug Components: Replication Reporter: Virag Kothari Assignee: Virag Kothari The replication source will discard the WAL file in many cases when it encounters an exception reading it . This can cause data loss and the underlying reason of failed read remains hidden. Only in certain scenarios, the replication source should dump the current WAL and move to the next one. This JIRA aims to have an hbck option to check the WAL files of replication queues for any inconsistencies and also provide an option to fix it. The fix can be to remove the file from replication queue in zk and from the memory of replication source manager and replication sources. A region server endpoint call from the hbck client to region server can be used to achieve this. Hbck can be configured with the following options: -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL currently read by replication source) from replication queue. If there is a position associated, it also seeks to that position and reads an entry from there -hardCheckReplicationWAL: Check all WAL paths from replication queues by reading them completely to make sure they are ok. -fixMissingReplicationWAL: Remove the WAL's from replication queues which are not present on hdfs -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which are corrupted (based on the findings from softCheck/hardCheck). Also the WAL's are moved to a quarantine dir -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is first rolled over and then deals with it in the same way as -fixCorruptedReplicationWAL option -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12125) Add Hbck option to check and fix WAL's from replication queue
[ https://issues.apache.org/jira/browse/HBASE-12125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14153936#comment-14153936 ] Andrew Purtell commented on HBASE-12125: {quote} -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which are corrupted (based on the findings from softCheck/hardCheck). Also the WAL's are moved to a quarantine dir -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is first rolled over and then deals with it in the same way as -fixCorruptedReplicationWAL option {quote} When would we want fixCorruptedReplicationWAL instead of rollAndFixCorruptedReplicationWAL? Add Hbck option to check and fix WAL's from replication queue - Key: HBASE-12125 URL: https://issues.apache.org/jira/browse/HBASE-12125 Project: HBase Issue Type: Bug Components: Replication Reporter: Virag Kothari Assignee: Virag Kothari The replication source will discard the WAL file in many cases when it encounters an exception reading it . This can cause data loss and the underlying reason of failed read remains hidden. Only in certain scenarios, the replication source should dump the current WAL and move to the next one. This JIRA aims to have an hbck option to check the WAL files of replication queues for any inconsistencies and also provide an option to fix it. The fix can be to remove the file from replication queue in zk and from the memory of replication source manager and replication sources. A region server endpoint call from the hbck client to region server can be used to achieve this. Hbck can be configured with the following options: -softCheckReplicationWAL : Tries to open only the oldest WAL (the WAL currently read by replication source) from replication queue. If there is a position associated, it also seeks to that position and reads an entry from there -hardCheckReplicationWAL: Check all WAL paths from replication queues by reading them completely to make sure they are ok. -fixMissingReplicationWAL: Remove the WAL's from replication queues which are not present on hdfs -fixCorruptedReplicationWAL: Remove the WAL's from replication queues which are corrupted (based on the findings from softCheck/hardCheck). Also the WAL's are moved to a quarantine dir -rollAndFixCorruptedReplicationWAL - If the current WAL is corrupted, it is first rolled over and then deals with it in the same way as -fixCorruptedReplicationWAL option -- This message was sent by Atlassian JIRA (v6.3.4#6332)