[jira] [Updated] (HIVE-20901) running compactor when there is nothing to do produces duplicate data
[ https://issues.apache.org/jira/browse/HIVE-20901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-20901: -- Fix Version/s: 4.0.0 Resolution: Duplicate Status: Resolved (was: Patch Available) [~asomani]: If you do not mind I close this jira as it was fixed by HIVE-9995. Sorry for the confusion, I have found this jira only now :( > running compactor when there is nothing to do produces duplicate data > - > > Key: HIVE-20901 > URL: https://issues.apache.org/jira/browse/HIVE-20901 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 4.0.0 >Reporter: Eugene Koifman >Assignee: Abhishek Somani >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-20901.1.patch, HIVE-20901.2.patch > > > suppose we run minor compaction 2 times, via alter table > The 2nd request to compaction should have nothing to do but I don't think > there is a check for that. It's visible in the context of HIVE-20823, where > each compactor run produces a delta with new visibility suffix so we end up > with something like > {noformat} > target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ > ├── delete_delta_001_002_v019 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delete_delta_001_002_v021 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_001_ > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_002_v019 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_002_v021 > │ ├── _orc_acid_version > │ └── bucket_0 > └── delta_002_002_ > ├── _orc_acid_version > └── bucket_0{noformat} > i.e. 2 deltas with the same write ID range > this is bad. Probably happens today as well but new run produces a delta > with the same name and clobbers the previous one, which may interfere with > writers > > need to investigate > > -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both > deltas as if they were distinct and it effectively duplicates data.- There > is no data duplication - {{getAcidState()}} will not use 2 deltas with the > same {{writeid}} range > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-20901) running compactor when there is nothing to do produces duplicate data
[ https://issues.apache.org/jira/browse/HIVE-20901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Somani updated HIVE-20901: --- Status: Patch Available (was: Open) > running compactor when there is nothing to do produces duplicate data > - > > Key: HIVE-20901 > URL: https://issues.apache.org/jira/browse/HIVE-20901 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 4.0.0 >Reporter: Eugene Koifman >Assignee: Abhishek Somani >Priority: Major > Attachments: HIVE-20901.1.patch, HIVE-20901.2.patch > > > suppose we run minor compaction 2 times, via alter table > The 2nd request to compaction should have nothing to do but I don't think > there is a check for that. It's visible in the context of HIVE-20823, where > each compactor run produces a delta with new visibility suffix so we end up > with something like > {noformat} > target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ > ├── delete_delta_001_002_v019 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delete_delta_001_002_v021 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_001_ > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_002_v019 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_002_v021 > │ ├── _orc_acid_version > │ └── bucket_0 > └── delta_002_002_ > ├── _orc_acid_version > └── bucket_0{noformat} > i.e. 2 deltas with the same write ID range > this is bad. Probably happens today as well but new run produces a delta > with the same name and clobbers the previous one, which may interfere with > writers > > need to investigate > > -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both > deltas as if they were distinct and it effectively duplicates data.- There > is no data duplication - {{getAcidState()}} will not use 2 deltas with the > same {{writeid}} range > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20901) running compactor when there is nothing to do produces duplicate data
[ https://issues.apache.org/jira/browse/HIVE-20901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Somani updated HIVE-20901: --- Attachment: HIVE-20901.2.patch > running compactor when there is nothing to do produces duplicate data > - > > Key: HIVE-20901 > URL: https://issues.apache.org/jira/browse/HIVE-20901 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 4.0.0 >Reporter: Eugene Koifman >Assignee: Abhishek Somani >Priority: Major > Attachments: HIVE-20901.1.patch, HIVE-20901.2.patch > > > suppose we run minor compaction 2 times, via alter table > The 2nd request to compaction should have nothing to do but I don't think > there is a check for that. It's visible in the context of HIVE-20823, where > each compactor run produces a delta with new visibility suffix so we end up > with something like > {noformat} > target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ > ├── delete_delta_001_002_v019 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delete_delta_001_002_v021 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_001_ > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_002_v019 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_002_v021 > │ ├── _orc_acid_version > │ └── bucket_0 > └── delta_002_002_ > ├── _orc_acid_version > └── bucket_0{noformat} > i.e. 2 deltas with the same write ID range > this is bad. Probably happens today as well but new run produces a delta > with the same name and clobbers the previous one, which may interfere with > writers > > need to investigate > > -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both > deltas as if they were distinct and it effectively duplicates data.- There > is no data duplication - {{getAcidState()}} will not use 2 deltas with the > same {{writeid}} range > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20901) running compactor when there is nothing to do produces duplicate data
[ https://issues.apache.org/jira/browse/HIVE-20901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Somani updated HIVE-20901: --- Status: Open (was: Patch Available) > running compactor when there is nothing to do produces duplicate data > - > > Key: HIVE-20901 > URL: https://issues.apache.org/jira/browse/HIVE-20901 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 4.0.0 >Reporter: Eugene Koifman >Assignee: Abhishek Somani >Priority: Major > Attachments: HIVE-20901.1.patch, HIVE-20901.2.patch > > > suppose we run minor compaction 2 times, via alter table > The 2nd request to compaction should have nothing to do but I don't think > there is a check for that. It's visible in the context of HIVE-20823, where > each compactor run produces a delta with new visibility suffix so we end up > with something like > {noformat} > target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ > ├── delete_delta_001_002_v019 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delete_delta_001_002_v021 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_001_ > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_002_v019 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_002_v021 > │ ├── _orc_acid_version > │ └── bucket_0 > └── delta_002_002_ > ├── _orc_acid_version > └── bucket_0{noformat} > i.e. 2 deltas with the same write ID range > this is bad. Probably happens today as well but new run produces a delta > with the same name and clobbers the previous one, which may interfere with > writers > > need to investigate > > -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both > deltas as if they were distinct and it effectively duplicates data.- There > is no data duplication - {{getAcidState()}} will not use 2 deltas with the > same {{writeid}} range > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20901) running compactor when there is nothing to do produces duplicate data
[ https://issues.apache.org/jira/browse/HIVE-20901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Somani updated HIVE-20901: --- Status: Patch Available (was: Open) > running compactor when there is nothing to do produces duplicate data > - > > Key: HIVE-20901 > URL: https://issues.apache.org/jira/browse/HIVE-20901 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 4.0.0 >Reporter: Eugene Koifman >Assignee: Abhishek Somani >Priority: Major > Attachments: HIVE-20901.1.patch > > > suppose we run minor compaction 2 times, via alter table > The 2nd request to compaction should have nothing to do but I don't think > there is a check for that. It's visible in the context of HIVE-20823, where > each compactor run produces a delta with new visibility suffix so we end up > with something like > {noformat} > target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ > ├── delete_delta_001_002_v019 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delete_delta_001_002_v021 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_001_ > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_002_v019 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_002_v021 > │ ├── _orc_acid_version > │ └── bucket_0 > └── delta_002_002_ > ├── _orc_acid_version > └── bucket_0{noformat} > i.e. 2 deltas with the same write ID range > this is bad. Probably happens today as well but new run produces a delta > with the same name and clobbers the previous one, which may interfere with > writers > > need to investigate > > -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both > deltas as if they were distinct and it effectively duplicates data.- There > is no data duplication - {{getAcidState()}} will not use 2 deltas with the > same {{writeid}} range > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20901) running compactor when there is nothing to do produces duplicate data
[ https://issues.apache.org/jira/browse/HIVE-20901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Somani updated HIVE-20901: --- Attachment: HIVE-20901.1.patch > running compactor when there is nothing to do produces duplicate data > - > > Key: HIVE-20901 > URL: https://issues.apache.org/jira/browse/HIVE-20901 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 4.0.0 >Reporter: Eugene Koifman >Assignee: Abhishek Somani >Priority: Major > Attachments: HIVE-20901.1.patch > > > suppose we run minor compaction 2 times, via alter table > The 2nd request to compaction should have nothing to do but I don't think > there is a check for that. It's visible in the context of HIVE-20823, where > each compactor run produces a delta with new visibility suffix so we end up > with something like > {noformat} > target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ > ├── delete_delta_001_002_v019 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delete_delta_001_002_v021 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_001_ > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_002_v019 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_002_v021 > │ ├── _orc_acid_version > │ └── bucket_0 > └── delta_002_002_ > ├── _orc_acid_version > └── bucket_0{noformat} > i.e. 2 deltas with the same write ID range > this is bad. Probably happens today as well but new run produces a delta > with the same name and clobbers the previous one, which may interfere with > writers > > need to investigate > > -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both > deltas as if they were distinct and it effectively duplicates data.- There > is no data duplication - {{getAcidState()}} will not use 2 deltas with the > same {{writeid}} range > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20901) running compactor when there is nothing to do produces duplicate data
[ https://issues.apache.org/jira/browse/HIVE-20901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-20901: -- Description: suppose we run minor compaction 2 times, via alter table The 2nd request to compaction should have nothing to do but I don't think there is a check for that. It's visible in the context of HIVE-20823, where each compactor run produces a delta with new visibility suffix so we end up with something like {noformat} target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ ├── delete_delta_001_002_v019 │ ├── _orc_acid_version │ └── bucket_0 ├── delete_delta_001_002_v021 │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_001_ │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_002_v019 │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_002_v021 │ ├── _orc_acid_version │ └── bucket_0 └── delta_002_002_ ├── _orc_acid_version └── bucket_0{noformat} i.e. 2 deltas with the same write ID range this is bad. Probably happens today as well but new run produces a delta with the same name and clobbers the previous one, which may interfere with writers need to investigate -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both deltas as if they were distinct and it effectively duplicates data.- There is no data duplication - {{getAcidState()}} will not use 2 deltas with the same {{writeid}} range was: suppose we run minor compaction 2 times, via alter table The 2nd request to compaction should have nothing to do but I don't think there is a check for that. It's visible in the context of HIVE-20823, where each compactor run produces a delta with new visibility suffix so we end up with something like {noformat} target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ ├── delete_delta_001_002_v019 │ ├── _orc_acid_version │ └── bucket_0 ├── delete_delta_001_002_v021 │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_001_ │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_002_v019 │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_002_v021 │ ├── _orc_acid_version │ └── bucket_0 └── delta_002_002_ ├── _orc_acid_version └── bucket_0{noformat} i.e. 2 deltas with the same write ID range this is bad. Probably happens today as well but new run produces a delta with the same name and clobbers the previous one, which may interfere with writers need to investigate -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both deltas as if they were distinct and it effectively duplicates data.- There is no data duplication - {{getAcidState()}} will use 2 deltas with the same \{{writeid}} range > running compactor when there is nothing to do produces duplicate data > - > > Key: HIVE-20901 > URL: https://issues.apache.org/jira/browse/HIVE-20901 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 4.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > > suppose we run minor compaction 2 times, via alter table > The 2nd request to compaction should have nothing to do but I don't think > there is a check for that. It's visible in the context of HIVE-20823, where > each compactor run produces a delta with new visibility suffix so we end up > with something like > {noformat} > target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ > ├── delete_delta_001_002_v019 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delete_delta_001_002_v021 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_001_ > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_002_v019 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_002_v021 > │ ├── _orc_acid_version > │ └── bucket_0 > └── delta_002_002_ > ├── _orc_acid_version > └── bucket_0{noformat} > i.e. 2 deltas with the same write ID range > this is bad. Probably happens today as well but new run produces a delta > with the same name and clobbers the previous one, which may interfere with > writers > > need to investigate > > -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both > deltas as if they were distinct and it effectively duplicates data.- There > is no data duplication - {{getAcidState()}} will not use 2 deltas with the > same {{writeid}} range > > -- This message was sent by Atl
[jira] [Updated] (HIVE-20901) running compactor when there is nothing to do produces duplicate data
[ https://issues.apache.org/jira/browse/HIVE-20901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-20901: -- Description: suppose we run minor compaction 2 times, via alter table The 2nd request to compaction should have nothing to do but I don't think there is a check for that. It's visible in the context of HIVE-20823, where each compactor run produces a delta with new visibility suffix so we end up with something like {noformat} target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ ├── delete_delta_001_002_v019 │ ├── _orc_acid_version │ └── bucket_0 ├── delete_delta_001_002_v021 │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_001_ │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_002_v019 │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_002_v021 │ ├── _orc_acid_version │ └── bucket_0 └── delta_002_002_ ├── _orc_acid_version └── bucket_0{noformat} i.e. 2 deltas with the same write ID range this is bad. Probably happens today as well but new run produces a delta with the same name and clobbers the previous one, which may interfere with writers need to investigate -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both deltas as if they were distinct and it effectively duplicates data.- There is no data duplication - {{getAcidState()}} will use 2 deltas with the same \{{writeid}} range was: suppose we run minor compaction 2 times, via alter table The 2nd request to compaction should have nothing to do but I don't think there is a check for that. It's visible in the context of HIVE-20823, where each compactor run produces a delta with new visibility suffix so we end up with something like {noformat} target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ ├── delete_delta_001_002_v019 │ ├── _orc_acid_version │ └── bucket_0 ├── delete_delta_001_002_v021 │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_001_ │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_002_v019 │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_002_v021 │ ├── _orc_acid_version │ └── bucket_0 └── delta_002_002_ ├── _orc_acid_version └── bucket_0{noformat} i.e. 2 deltas with the same write ID range this is bad. Probably happens today as well but new run produces a delta with the same name and clobbers the previous one, which may interfere with writers need to investigate The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both deltas as if they were distinct and it effectively duplicates data. > running compactor when there is nothing to do produces duplicate data > - > > Key: HIVE-20901 > URL: https://issues.apache.org/jira/browse/HIVE-20901 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 4.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > > suppose we run minor compaction 2 times, via alter table > The 2nd request to compaction should have nothing to do but I don't think > there is a check for that. It's visible in the context of HIVE-20823, where > each compactor run produces a delta with new visibility suffix so we end up > with something like > {noformat} > target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ > ├── delete_delta_001_002_v019 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delete_delta_001_002_v021 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_001_ > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_002_v019 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_002_v021 > │ ├── _orc_acid_version > │ └── bucket_0 > └── delta_002_002_ > ├── _orc_acid_version > └── bucket_0{noformat} > i.e. 2 deltas with the same write ID range > this is bad. Probably happens today as well but new run produces a delta > with the same name and clobbers the previous one, which may interfere with > writers > > need to investigate > > -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both > deltas as if they were distinct and it effectively duplicates data.- There > is no data duplication - {{getAcidState()}} will use 2 deltas with the same > \{{writeid}} range > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-20901) running compactor when there is nothing to do produces duplicate data
[ https://issues.apache.org/jira/browse/HIVE-20901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-20901: -- Description: suppose we run minor compaction 2 times, via alter table The 2nd request to compaction should have nothing to do but I don't think there is a check for that. It's visible in the context of HIVE-20823, where each compactor run produces a delta with new visibility suffix so we end up with something like {noformat} target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ ├── delete_delta_001_002_v019 │ ├── _orc_acid_version │ └── bucket_0 ├── delete_delta_001_002_v021 │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_001_ │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_002_v019 │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_002_v021 │ ├── _orc_acid_version │ └── bucket_0 └── delta_002_002_ ├── _orc_acid_version └── bucket_0{noformat} i.e. 2 deltas with the same write ID range this is bad. Probably happens today as well but new run produces a delta with the same name and clobbers the previous one, which may interfere with writers need to investigate The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both deltas as if they were distinct and it effectively duplicates data. was: suppose we run minor compaction 2 times, via alter table The 2nd request to compaction should have nothing to do but I don't think there is a check for that. It's visible in the context of HIVE-20823, where each compactor run produces a delta with new visibility suffix so we end up with something like {noformat} target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ ├── delete_delta_001_002_v019 │ ├── _orc_acid_version │ └── bucket_0 ├── delete_delta_001_002_v021 │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_001_ │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_002_v019 │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_002_v021 │ ├── _orc_acid_version │ └── bucket_0 └── delta_002_002_ ├── _orc_acid_version └── bucket_0{noformat} i.e. 2 deltas with the same write ID range this is bad. Probably happens today as well but new run produces a delta with the same name and clobbers the previous one, which may interfere with writers need to investigate > running compactor when there is nothing to do produces duplicate data > - > > Key: HIVE-20901 > URL: https://issues.apache.org/jira/browse/HIVE-20901 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 4.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > > suppose we run minor compaction 2 times, via alter table > The 2nd request to compaction should have nothing to do but I don't think > there is a check for that. It's visible in the context of HIVE-20823, where > each compactor run produces a delta with new visibility suffix so we end up > with something like > {noformat} > target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ > ├── delete_delta_001_002_v019 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delete_delta_001_002_v021 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_001_ > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_002_v019 > │ ├── _orc_acid_version > │ └── bucket_0 > ├── delta_001_002_v021 > │ ├── _orc_acid_version > │ └── bucket_0 > └── delta_002_002_ > ├── _orc_acid_version > └── bucket_0{noformat} > i.e. 2 deltas with the same write ID range > this is bad. Probably happens today as well but new run produces a delta > with the same name and clobbers the previous one, which may interfere with > writers > > need to investigate > > The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both > deltas as if they were distinct and it effectively duplicates data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)