[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985239#comment-16985239 ] Bo Cui commented on HBASE-22075: Is it better to add the hbase-23353 solution? pls check > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.7, 2.2.3, 2.1.9 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, > HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960770#comment-16960770 ] HBase QA commented on HBASE-22075: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 9s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 32s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 4m 46s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 32s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 15m 28s{color} | {color:green} Patch does not cause any errors with Hadoop 2.8.5 2.9.2 or 3.1.2. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s{color} | {color:green} hbase-it in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 10s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 44m 41s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.4 Server=19.03.4 base: https://builds.apache.org/job/PreCommit-HBASE-Build/976/artifact/patchprocess/Dockerfile | | JIRA Issue | HBASE-22075 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12972474/HBASE-22075.test-only.2.patch | | Optional Tests | dupname asflicense javac javadoc unit shadedjars hadoopcheck xml compile spotbugs findbugs hbaseanti checkstyle | | uname | Linux badf90de922d 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/hbase-personality.sh | | git revision | master / d7deafa120 | | Default Java | 1.8.0_181 | | Test Results |
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882070#comment-16882070 ] Sean Busbey commented on HBASE-22075: - bq. As for HBASE-16812, I do not think it is the only patch which affected MOB in a bad way - there should be others. I am saying that, because my own test failed with HBASE-16812 reverted (on HDP-2.6.5). yeah, I agree here. I finished backporting HBASE-16812 onto CDH5.13.3 last night and the IT still shows no dataloss. bq. To prevent non-atomic failures we will need acid txs? No? To prevent cross-region non-atomic failures yes. But I don't think we need to prevent that; we just need to update the logic of committing the updated refs to handle non-atomic failure. the bulkload code assumes someone will work it out handling the failure externally and then we don't. I think instead we should use the per-region atomic commit of bulk loaded files to track when we've successfully made use of our newly compacted files. if we can't succeed on retry for a given region we can just keep around the old mob files for references in that region and then clean things up on the next mob compaction. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.2.1, 2.1.6 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, > HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880569#comment-16880569 ] Vladimir Rodionov commented on HBASE-22075: --- To prevent non-atomic failures we will need acid txs? No? As for HBASE-16812, I do not think it is the only patch which affected MOB in a bad way - there should be others. I am saying that, because my own test failed with HBASE-16812 reverted (on HDP-2.6.5). > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.2.1, 2.1.6 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, > HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880489#comment-16880489 ] Sean Busbey commented on HBASE-22075: - My current opinion is that there are a couple of different issues to solve here. 1) I found that all of the places we see this particular dataloss test show a problem include HBASE-16812. Before that change there's a lock preventing overlaps between compaction and mob compaction. CDH5's backport of the MOB feature does not include this change. Since that change is too far back to easily revert in master or branches-2 I'm going to test this theory by backporting it on top of CDH5 and see if the IT then shows the dataloss. will report back. 2) Independent of the problem with races between compaction and mob compaction, I think the use of bulk load to commit the updated ref files is subject to non-atomic failure. We should either confirm that it isn't or rework how we commit the updated mob references. My intuition is that we should be able to do this region-by-region using the building blocks that bulk loading is based on without needing to completely overhaul mob accounting or mob compaction (e.g. we shouldn't need something like the distributed procedure based mob compaction from HBASE-15381) > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.2.1, 2.1.6 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, > HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877463#comment-16877463 ] Sean Busbey commented on HBASE-22075: - in CDH5 by default it's {{Max(10, # regions +1)}}, which is the same as master. Master gets the region count from the new async region locator, but presumably it gives the same answer. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.2.1, 2.1.6 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, > HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877433#comment-16877433 ] Vladimir Rodionov commented on HBASE-22075: --- [~busbey], what is the value of *hbase.bulkload.retries.number* in mob reference file loading in CDH5? Is it Integer.MAX_INT? > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.2.1, 2.1.6 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, > HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877429#comment-16877429 ] Vladimir Rodionov commented on HBASE-22075: --- I think, *hbase.bulkload.retries.number* in a master branch is equals to # of regions in a table + 1. This should work in any case? The original (first one) patch to this ticket handles bulkhead failures due to insufficient # of retries, but can not handle race conditions created during *ReproMOBDataLoss.java* run. This can be not a MOB issue at all, my guess. Something not MOB related is different in CDH 5? May be general compaction code? > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.2.1, 2.1.6 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, > HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877397#comment-16877397 ] Sean Busbey commented on HBASE-22075: - I've been having a lot of trouble trying to reproduce this failure using the IT on the CDH5 backport of the MOB feature. It's been confusing since the code that I think is responsible is essentially the same - bulk loading the updated references into the non-MOB regions. At the end of last week I started looking at the bulk load code to see if that was the difference. [~npopa] suggested it was a difference in how much failures are retried by default. It doesn't look obviously different. However, if I take the master branch and update the {{PartitionedMobCompactor}} to set {{hbase.bulkload.retries.number}} to {{MAX_INT}} on the conf it uses when calling bulk load, then the IT no longer reproduces the failure. So even if it isn't specifically the bulkload retries that are different I think we're on the right track that currently the "if it fails just bail" approach is the source of the inconsistency. I'm going to see if I can rework how we do committing the post-compaction references in the non-MOB regions to expressly handle error conditions. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.2.1, 2.1.6 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, > HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870019#comment-16870019 ] Sean Busbey commented on HBASE-22075: - v2 also reproduced the issue on branch-2. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.2.1, 2.1.6 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, > HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869948#comment-16869948 ] Vladimir Rodionov commented on HBASE-22075: --- So, basically what we have discovered and confirmed is : *MOB feature is not safe to use*. We have a patch, which fixes race condition issue and related data loss, but the patch is quite large and a major rework of a MOB feature itself. It has both: pluses and minuses: On a plus side - MOB compaction is not Master orchestrated anymore and can be run in parallel, on a minus side - MOB data is compacted during normal regular major compactions and it imposes additional I/O load. Kind of pro- and contra- thing. Meanwhile, the simplest mitigation/solution to a data loss is conversion MOB table back to a regular one by setting *MOB_THRESHOLD* to a very large value, which must to be large than any potential MOB value size: # Alter MOB table {code} hbase shell> disable ‘table’ hbase shell> alter ‘table’, {NAME => ‘column-family’, MOB_THRESHOLD => 100} hbase shell> enable ‘table’ {code} # Run major compaction on this table # Clean up MOB directory data for the table (its under /hbase/data/mobdir/data/'table') > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.2.1, 2.1.6 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, > HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869860#comment-16869860 ] HBase QA commented on HBASE-22075: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 29s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 33s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 37s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 12m 49s{color} | {color:green} Patch does not cause any errors with Hadoop 2.8.5 2.9.2 or 3.1.2. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 44s{color} | {color:green} hbase-it in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 8s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 38m 7s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/PreCommit-HBASE-Build/567/artifact/patchprocess/Dockerfile | | JIRA Issue | HBASE-22075 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12972474/HBASE-22075.test-only.2.patch | | Optional Tests | dupname asflicense javac javadoc unit shadedjars hadoopcheck xml compile findbugs hbaseanti checkstyle | | uname | Linux f886a245da36 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 13 15:00:41 UTC 2019 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/hbase-personality.sh | | git revision | master / 6d08ffcfc6 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/567/testReport/ | | Max. process+thread count | 389 (vs. ulimit of 1) | | modules | C: hbase-it U: hbase-it | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/567/console | | Powered
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869837#comment-16869837 ] Sean Busbey commented on HBASE-22075: - I had a reproduction attempt timeout because something went wonky that bubbled up to compactions failing. in v1 when that happens we immediately repeat the request instead of waiting. that eventually put enough pressure on things that my attempts to write failed, which just exited the data writing thread and left the test to wait for the full timeout. test-only v2 - move to java.util.concurrent Executor to handle periodic tasks and waiting for an async task to be done - if the data writing thread gets an error end the test as a failure right away > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.2.1, 2.1.6 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, > HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869589#comment-16869589 ] HBase QA commented on HBASE-22075: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 13s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 37s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 39s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 13m 1s{color} | {color:green} Patch does not cause any errors with Hadoop 2.8.5 2.9.2 or 3.1.2. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 50s{color} | {color:green} hbase-it in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 9s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 38m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/PreCommit-HBASE-Build/566/artifact/patchprocess/Dockerfile | | JIRA Issue | HBASE-22075 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12972450/HBASE-22075.test-only.1.patch | | Optional Tests | dupname asflicense javac javadoc unit shadedjars hadoopcheck xml compile findbugs hbaseanti checkstyle | | uname | Linux e98896e11ba7 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 13 15:00:41 UTC 2019 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/hbase-personality.sh | | git revision | master / 6d08ffcfc6 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/566/testReport/ | | Max. process+thread count | 393 (vs. ulimit of 1) | | modules | C: hbase-it U: hbase-it | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/566/console | | Powered
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869545#comment-16869545 ] Sean Busbey commented on HBASE-22075: - test-only v1 - fixed checkstyle complaints. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.2.1, 2.1.6 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, > ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869261#comment-16869261 ] HBase QA commented on HBASE-22075: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 38s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 31s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 38s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s{color} | {color:red} hbase-it: The patch generated 17 new + 0 unchanged - 0 fixed = 17 total (was 0) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 41s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 13m 32s{color} | {color:green} Patch does not cause any errors with Hadoop 2.8.5 2.9.2 or 3.1.2. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 51s{color} | {color:green} hbase-it in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 9s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 39m 33s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/PreCommit-HBASE-Build/565/artifact/patchprocess/Dockerfile | | JIRA Issue | HBASE-22075 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12972408/HBASE-22075.test-only.0.patch | | Optional Tests | dupname asflicense javac javadoc unit shadedjars hadoopcheck xml compile findbugs hbaseanti checkstyle | | uname | Linux 008288261fc5 4.4.0-143-generic #169~14.04.2-Ubuntu SMP Wed Feb 13 15:00:41 UTC 2019 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/hbase-personality.sh | | git revision | master / 6d08ffcfc6 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/565/artifact/patchprocess/diff-checkstyle-hbase-it.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/565/testReport/ | | Max.
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869245#comment-16869245 ] Sean Busbey commented on HBASE-22075: - the attached {{HBASE-22075.test-only.0.patch}} adapts [~kpalanisamy]'s reproduction into an integration test. If I let this new test run on my local laptop it currently produces a failure due to missing MOB files after about 1.5 hours. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.2.1, 2.1.6 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16868918#comment-16868918 ] Sean Busbey commented on HBASE-22075: - I think I have ReproMobDataLoss converted into an integration test. couple of changes and I should be able to post it. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.2.1, 2.1.6 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842504#comment-16842504 ] stack commented on HBASE-22075: --- Moving out. Still WIP it seems. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.2.1, 2.1.6 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832954#comment-16832954 ] Karthik Palanisamy commented on HBASE-22075: [~elserj] Tested [~vrodionov] patch. This fix handled MOB data well safe. No data loss is noticed during MOB compaction failure, RS crash, IO issue, and any infra issues. But majorly race condition is noticed w/ or w/o patch in the MOB. We see data loss again during MOB-compaction and Major-compaction while those are running together. As [~vrodionov] already mentioned, there will be a race condition in this case. I think he already working on a new patch. I have attached a small repro code (ReproMOBDataLoss.java) for this race condition. This is an aggressive test. Test duration is nearly an hour. # Settings: Region Size 200 MB, Flush threshold 800 KB. # Insert 10 Million records # MOB Compaction and Archiver a) Trigger MOB Compaction (every 2 minutes) b) Trigger major compaction (every 2 minutes) c) Trigger archive cleaner (every 3 minutes) # Validate MOB data after complete data load. I ran this repro code on branch-2.2. The issue is reproduced. Also, ran this repro code after disabling MOB compaction. No data loss is noticed. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.1.5, 2.2.1 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803562#comment-16803562 ] Hadoop QA commented on HBASE-22075: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 13s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 28s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 7s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 29s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 3m 29s{color} | {color:red} hbase-server generated 3 new + 191 unchanged - 3 fixed = 194 total (was 194) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 12s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 10m 3s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}262m 29s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}311m 39s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.quotas.TestSpaceQuotas | | | hadoop.hbase.client.TestFromClientSideWithCoprocessor | | | hadoop.hbase.master.procedure.TestSCPWithReplicas | | | hadoop.hbase.replication.multiwal.TestReplicationSyncUpToolWithMultipleWAL | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-22075 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12963957/HBASE-22075-v2.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux f3692fd68972 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 59406c44e3 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | |
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803402#comment-16803402 ] Vladimir Rodionov commented on HBASE-22075: --- Patch v2, simplified. The only change in the code now: we do not delete compacted MOB file due to bulk load phase failure anymore. The reason is explained in previous comments. There is no need anymore to artificially inflate number of retries, as since post 2.0 LoadIncrementalHFiles tool calculates number of attempts, based on current number of regions in a table. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: mob > Fix For: 2.2.0, 2.0.6, 2.1.5 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801941#comment-16801941 ] Josh Elser commented on HBASE-22075: Sorry! I think we were just talking past each other. Want to include your test in a second revision of your patch? Someone else can run your test to validate the issue being reproduced, and then with QA passing, we can call that a regression test :). I think that's sufficient for me. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: mob > Fix For: 2.2.0, 2.0.6, 2.1.5 > > Attachments: HBASE-22075-v1.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801925#comment-16801925 ] Vladimir Rodionov commented on HBASE-22075: --- {quote} What about your sketch of a test doesn't work with your fix? {quote} Not sure, I am following you here, [~elserj]. Without fix, the issue can be reproduced - I have already working unit test. To show that the patch fixes the issue, we need to run this test twice: with patch disabled and enabled - this is what I call - hack. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: mob > Fix For: 2.2.0, 2.0.6, 2.1.5 > > Attachments: HBASE-22075-v1.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801730#comment-16801730 ] Josh Elser commented on HBASE-22075: bq. To reproduce this with the patch we will need some special test-related hacks in the code. I'm curious: what you describe sounds like an effective test. What about your sketch of a test doesn't work with your fix? > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: mob > Fix For: 2.2.0, 2.1.4, 2.0.6 > > Attachments: HBASE-22075-v1.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799257#comment-16799257 ] Vladimir Rodionov commented on HBASE-22075: --- {quote} Is this something that we can show in a unit test? {quote} I have one, but it works only without patch, [~elserj]. Scenario is the following: 0. set "hbase.bulkload.retries.number" to 2 1. Create MOB table (1 region) 2. Load some of MOB data 3. Flush 4. Load MOB data again 5. Flush 6. Now we have 2 store files and 2 MOB file 7. Split table up to at least 4 regions 8. Trigger MOB compaction. Compaction should fail with some data loaded partially 9. Verify, that we have data missing. To reproduce this with the patch we will need some special test-related hacks in the code. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: mob > Fix For: 2.2.0, 2.0.5, 2.1.4 > > Attachments: HBASE-22075-v1.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799031#comment-16799031 ] Josh Elser commented on HBASE-22075: {quote}Forced splits during bulkload due to changed region boundaries, [~elserj]. {quote} Ah, the import of the ref file itself being split (even though it has one entry). I'm with you now. I'll have to look at how we find MOB files work on split (there must be some logic for the daughter regions to find the MOB files from the parent region, right?). So, we end up trying to bulk-load a ref file into two regions (after a split), one of them succeeds and one of them doesn't. Thus, that new ref file overwrites the old refs, and if we clean up the MOB files, we lose data. Ok, I can see that now :) Is this something that we can show in a unit test? I feel like that would go a long way to help prevent data loss in the future. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: mob > Fix For: 2.2.0, 2.0.5, 2.1.4 > > Attachments: HBASE-22075-v1.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798500#comment-16798500 ] Hadoop QA commented on HBASE-22075: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange} 0m 0s{color} | {color:orange} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 8s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 11s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 30s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 23s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 2s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 15s{color} | {color:red} hbase-server: The patch generated 1 new + 10 unchanged - 0 fixed = 11 total (was 10) {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 33s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 36s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}146m 46s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}186m 21s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.client.TestAsyncTableGetMultiThreaded | | | hadoop.hbase.master.TestMasterAbortAndRSGotKilled | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-22075 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12963194/HBASE-22075-v1.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux a4f6d54c7996 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 31 10:55:11 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / cbd9c9b2e1 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | |
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798418#comment-16798418 ] Vladimir Rodionov commented on HBASE-22075: --- {quote} What is creating multiple files? {quote} Forced splits during bulkload due to changed region boundaries, [~elserj]. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: mob > Fix For: 2.2.0, 2.0.5, 2.1.4 > > Attachments: HBASE-22075-v1.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798396#comment-16798396 ] Josh Elser commented on HBASE-22075: {quote}some files (or parts of files after splitting) can be loaded, some may fail {quote} What is creating multiple files? Doesn't this compact create a single MOB file and then one reference file to that file? It looks like we write the MOB file in a tmp dir and then directly mv it into the mobFamily for the Region whose MOB files we're compacting. Maybe I need to look farther up the call stack to get it? {quote}These are artifacts of some code cleaning/modifications. The fileName argument is not used at all in *bulkloadRefFile* method, but only bulkloadDir, where ref file is located. {quote} Ok, looks like {{bulkloadPathOfPartition}} is an ancestor of {{bulkloadColumnPath}} which also makes this harder to follow. Thanks for clarifying that. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: mob > Fix For: 2.2.0, 2.0.5, 2.1.4 > > Attachments: HBASE-22075-v1.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798385#comment-16798385 ] Vladimir Rodionov commented on HBASE-22075: --- {quote} This looks to me that we end up bulk-loading the MOB file that we created, not the ref file? It's not clear to me what this is supposed to be doing. {quote} This is artifacts of some code cleaning/modifications. The fileName argument is not used at all in *bulkloadRefFile*method, but only bulkloadDir, where ref file is located. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: mob > Fix For: 2.2.0, 2.0.5, 2.1.4 > > Attachments: HBASE-22075-v1.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798374#comment-16798374 ] Vladimir Rodionov commented on HBASE-22075: --- {quote} Another question: In the case where we successfully created the new MOB file but the bulk loading of the ref file failed, how is deleting the new mob file causing data loss? The bulk load should be atomic, right? (Either all files are loaded or no files are loaded) I would think that if the bulk load of the ref file failed, we would be safe to get rid of that new MOB file because we still have all of the old MOB files. {quote} Bulkload is atomic per region only. If you have file(s) which span(s) several regions - no atomicity for you. In the latter case, some files (or parts of files after splitting) can be loaded, some may fail. You get partially successful operation and, as a result, - data loss (MOB was deleted bc of a compaction failure), but some parts of a new reference file(s) were loaded. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: mob > Fix For: 2.2.0, 2.0.5, 2.1.4 > > Attachments: HBASE-22075-v1.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798366#comment-16798366 ] Josh Elser commented on HBASE-22075: {quote}This sounds like a case where we ought to be using procedures? {quote} Yeah, sounds like this should eventually be rewritten. I'm having a hard time wrapping my head around this code, [~vrodionov]. I'm hoping you can clarify some of what this is doing... Just looking at {{PartitionedMobCompactor#compactMobFilesInBatch}}, we open a writer for the MOB file and then the reference file for that MOB file. The thing I don't understand is that we open a writer for that ref file, but we don't appear to _do_ anythign with it: {code:java} writer = MobUtils .createWriter(conf, fs, column, partition.getPartitionId().getLatestDate(), tempPath, Long.MAX_VALUE, column.getCompactionCompressionType(), partition.getPartitionId().getStartKey(), compactionCacheConfig, cryptoContext, true); cleanupTmpMobFile = true; filePath = writer.getPath(); ... // create a temp file and open a writer for it in the bulkloadPath refFileWriter = MobUtils.createRefFileWriter(conf, fs, column, bulkloadColumnPath, fileInfo.getSecond().longValue(), compactionCacheConfig, cryptoContext, true); ... for (Cell cell : cells) { // write the mob cell to the mob file. writer.append(cell); // write the new reference cell to the store file. Cell reference = MobUtils.createMobRefCell(cell, fileName, this.refCellTags); refFileWriter.append(reference); mobCells++; } ... if (cleanupBulkloadDirOfPartition) { // append metadata and bulkload info to the ref mob file, and close the writer. closeRefFileWriter(refFileWriter, fileInfo.getFirst(), request.selectionTime); } ... // bulkload the ref file bulkloadRefFile(connection, table, bulkloadPathOfPartition, filePath.getName()){code} This looks to me that we end up bulk-loading the MOB file that we created, not the ref file? It's not clear to me what this is supposed to be doing. Another question: In the case where we successfully created the new MOB file but the bulk loading of the ref file failed, how is deleting the new mob file causing data loss? The bulk load should be atomic, right? (Either all files are loaded or no files are loaded) I would think that if the bulk load of the ref file failed, we would be safe to get rid of that new MOB file because we still have all of the old MOB files. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: mob > Fix For: 2.2.0, 2.0.5, 2.1.4 > > Attachments: HBASE-22075-v1.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797710#comment-16797710 ] Vladimir Rodionov commented on HBASE-22075: --- Marked as Critical. All 2.0, 2.1, 2.2 version are affected. I will double check code again, but at least, Master orchestrated MOB compaction code is still in 2.x. Not sure, if it is still used. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: mob > Fix For: 2.2.0, 2.0.5, 2.1.4 > > Attachments: HBASE-22075-v1.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797704#comment-16797704 ] Sean Busbey commented on HBASE-22075: - This sounds like a case where we ought to be using procedures? > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22075-v1.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797701#comment-16797701 ] Sean Busbey commented on HBASE-22075: - What are the affected version(s)? Sounds like this should be marked Critical? > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22075-v1.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797644#comment-16797644 ] Vladimir Rodionov commented on HBASE-22075: --- The patch v1. The patch do the following: # Increases number of bulk load retries (for MOB compaction only) to 1000, which can be overwritten # Keeps newly created MOB file even if compaction fails. This will be taken care of next time MOB compaction will run. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22075-v1.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)