[jira] [Comment Edited] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985239#comment-16985239 ] Bo Cui edited comment on HBASE-22075 at 12/2/19 6:24 AM: - Is it better to add the HBASE-23353 solution? pls check [~vrodionov] was (Author: bo cui): Is it better to add the HBASE-23353 solution? pls check > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.7, 2.2.3, 2.1.9 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, > HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985239#comment-16985239 ] Bo Cui edited comment on HBASE-22075 at 11/30/19 3:22 AM: -- Is it better to add the HBASE-23353 solution? pls check was (Author: bo cui): Is it better to add the hbase-23353 solution? pls check > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.7, 2.2.3, 2.1.9 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, > HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877429#comment-16877429 ] Vladimir Rodionov edited comment on HBASE-22075 at 7/3/19 2:17 AM: --- *hbase.bulkload.retries.number* in a master branch is equals to # of regions in a table + 1. No?This should work in any case? The original (first one) patch to this ticket handles bulkload failures due to insufficient # of retries, but can not handle race conditions created during *ReproMOBDataLoss.java* run. This can be not a MOB issue at all, my guess. Something not MOB related is different in CDH 5? May be general compaction code? was (Author: vrodionov): I think, *hbase.bulkload.retries.number* in a master branch is equals to # of regions in a table + 1. This should work in any case? The original (first one) patch to this ticket handles bulkload failures due to insufficient # of retries, but can not handle race conditions created during *ReproMOBDataLoss.java* run. This can be not a MOB issue at all, my guess. Something not MOB related is different in CDH 5? May be general compaction code? > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.2.1, 2.1.6 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, > HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877429#comment-16877429 ] Vladimir Rodionov edited comment on HBASE-22075 at 7/3/19 2:16 AM: --- I think, *hbase.bulkload.retries.number* in a master branch is equals to # of regions in a table + 1. This should work in any case? The original (first one) patch to this ticket handles bulkload failures due to insufficient # of retries, but can not handle race conditions created during *ReproMOBDataLoss.java* run. This can be not a MOB issue at all, my guess. Something not MOB related is different in CDH 5? May be general compaction code? was (Author: vrodionov): I think, *hbase.bulkload.retries.number* in a master branch is equals to # of regions in a table + 1. This should work in any case? The original (first one) patch to this ticket handles bulkhead failures due to insufficient # of retries, but can not handle race conditions created during *ReproMOBDataLoss.java* run. This can be not a MOB issue at all, my guess. Something not MOB related is different in CDH 5? May be general compaction code? > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.2.1, 2.1.6 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, > HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16869948#comment-16869948 ] Vladimir Rodionov edited comment on HBASE-22075 at 6/21/19 11:22 PM: - So, basically what we have discovered and confirmed is : *MOB feature is not safe to use*. We have a patch, which fixes race condition issue and related data loss, but the patch is quite large and a major rework of a MOB feature itself. It has both: pluses and minuses: On a plus side - MOB compaction is not Master orchestrated anymore and can be run in parallel, on a minus side - MOB data is compacted during normal regular major compactions and it imposes additional I/O load. Kind of pro- and contra- thing. Meanwhile, the simplest mitigation/solution to a data loss is conversion MOB table back to a regular one by setting *MOB_THRESHOLD* to a very large value, which must to be large than any potential MOB value size: # Alter MOB table {code} hbase shell> disable ‘table’ hbase shell> alter ‘table’, {NAME => ‘column-family’, MOB_THRESHOLD => 100} hbase shell> enable ‘table’ {code} # Run major compaction on this table # Clean up MOB directory data for the table (its under /hbase/data/mobdir/data/'table') was (Author: vrodionov): So, basically what we have discovered and confirmed is : *MOB feature is not safe to use*. We have a patch, which fixes race condition issue and related data loss, but the patch is quite large and a major rework of a MOB feature itself. It has both: pluses and minuses: On a plus side - MOB compaction is not Master orchestrated anymore and can be run in parallel, on a minus side - MOB data is compacted during normal regular major compactions and it imposes additional I/O load. Kind of pro- and contra- thing. Meanwhile, the simplest mitigation/solution to a data loss is conversion MOB table back to a regular one by setting *MOB_THRESHOLD* to a very large value, which must to be large than any potential MOB value size: # Alter MOB table {code} hbase shell> disable ‘table’ hbase shell> alter ‘table’, {NAME => ‘column-family’, MOB_THRESHOLD => 100} hbase shell> enable ‘table’ {code} # Run major compaction on this table # Clean up MOB directory data for the table (its under /hbase/data/mobdir/data/'table') > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.2.1, 2.1.6 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, > HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842504#comment-16842504 ] stack edited comment on HBASE-22075 at 5/17/19 8:29 PM: Moving out of 2.1.5 to 2.1.6. Still WIP it seems. was (Author: stack): Moving out. Still WIP it seems. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: compaction, mob > Fix For: 2.0.6, 2.2.1, 2.1.6 > > Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, > ReproMOBDataLoss.java > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798385#comment-16798385 ] Vladimir Rodionov edited comment on HBASE-22075 at 3/21/19 8:00 PM: {quote} This looks to me that we end up bulk-loading the MOB file that we created, not the ref file? It's not clear to me what this is supposed to be doing. {quote} These are artifacts of some code cleaning/modifications. The fileName argument is not used at all in *bulkloadRefFile* method, but only bulkloadDir, where ref file is located. was (Author: vrodionov): {quote} This looks to me that we end up bulk-loading the MOB file that we created, not the ref file? It's not clear to me what this is supposed to be doing. {quote} This are artifacts of some code cleaning/modifications. The fileName argument is not used at all in *bulkloadRefFile* method, but only bulkloadDir, where ref file is located. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: mob > Fix For: 2.2.0, 2.0.5, 2.1.4 > > Attachments: HBASE-22075-v1.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798385#comment-16798385 ] Vladimir Rodionov edited comment on HBASE-22075 at 3/21/19 8:00 PM: {quote} This looks to me that we end up bulk-loading the MOB file that we created, not the ref file? It's not clear to me what this is supposed to be doing. {quote} This are artifacts of some code cleaning/modifications. The fileName argument is not used at all in *bulkloadRefFile* method, but only bulkloadDir, where ref file is located. was (Author: vrodionov): {quote} This looks to me that we end up bulk-loading the MOB file that we created, not the ref file? It's not clear to me what this is supposed to be doing. {quote} This are artifacts of some code cleaning/modifications. The fileName argument is not used at all in *bulkloadRefFile*method, but only bulkloadDir, where ref file is located. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: mob > Fix For: 2.2.0, 2.0.5, 2.1.4 > > Attachments: HBASE-22075-v1.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798385#comment-16798385 ] Vladimir Rodionov edited comment on HBASE-22075 at 3/21/19 8:00 PM: {quote} This looks to me that we end up bulk-loading the MOB file that we created, not the ref file? It's not clear to me what this is supposed to be doing. {quote} This are artifacts of some code cleaning/modifications. The fileName argument is not used at all in *bulkloadRefFile*method, but only bulkloadDir, where ref file is located. was (Author: vrodionov): {quote} This looks to me that we end up bulk-loading the MOB file that we created, not the ref file? It's not clear to me what this is supposed to be doing. {quote} This is artifacts of some code cleaning/modifications. The fileName argument is not used at all in *bulkloadRefFile*method, but only bulkloadDir, where ref file is located. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug > Components: mob >Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, > 2.1.3 >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Critical > Labels: mob > Fix For: 2.2.0, 2.0.5, 2.1.4 > > Attachments: HBASE-22075-v1.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HBASE-22075) Potential data loss when MOB compaction fails
[ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797644#comment-16797644 ] Vladimir Rodionov edited comment on HBASE-22075 at 3/20/19 11:17 PM: - The patch v1. The patch do the following: # Increases number of bulk load retries (for MOB compaction only) to 1000, which can be overwritten # Keeps newly created MOB file even if compaction fails. This will be taken care of next time MOB compaction will run. cc: [~elserj], [~busbey] was (Author: vrodionov): The patch v1. The patch do the following: # Increases number of bulk load retries (for MOB compaction only) to 1000, which can be overwritten # Keeps newly created MOB file even if compaction fails. This will be taken care of next time MOB compaction will run. > Potential data loss when MOB compaction fails > - > > Key: HBASE-22075 > URL: https://issues.apache.org/jira/browse/HBASE-22075 > Project: HBase > Issue Type: Bug >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov >Priority: Major > Attachments: HBASE-22075-v1.patch > > > When MOB compaction fails during last step (bulk load of a newly created > reference file) there is a high chance of a data loss due to partially loaded > reference file, cells of which refer to (now) non-existent MOB file. The > newly created MOB file is deleted automatically in case of a MOB compaction > failure, but some cells with the references to this file might be loaded to > HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)