Taraka Rama Rao Lethavadla created HIVE-26897: -------------------------------------------------
Summary: Provide a command/tool to recover data in ACID table when table data got corrupted with invalid/junk delta/delete_delta folders Key: HIVE-26897 URL: https://issues.apache.org/jira/browse/HIVE-26897 Project: Hive Issue Type: New Feature Reporter: Taraka Rama Rao Lethavadla Example: A table has below directories {noformat} drwx------ - hive hive 0 2022-11-05 19:43 /data/warehouse/tbl/delete_delta_0080483_0087704_v0973185 drwx------ - hive pdl_prod_nosh_jsin 0 2022-12-05 00:18 /data/warehouse/tbl/delete_delta_0080483_0088384_v1111507{noformat} When we read data from this table, we get below errors {noformat} java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Duplicate key null (attempted merging values org.apache.hadoop.hive.ql.io.AcidInputFormat$DeltaFileMetaData@41776cd9 and org.apache.hadoop.hive.ql.io.AcidInputFormat$DeltaFileMetaData@1404a054){noformat} delete_delta_0080483_0087704_v0973185,delete_delta_0080483_0088384_v1111507 are created as part of minor compaction. In general, once minor compaction completed, the next minor compaction picks min_writeId value as greater than the value of the previously compacted max_writeId. In this case for both the minor compacted directories could see min_writeId is the same (i.e. 0080483). To mitigate the issue, we had to remove those directories manually from hdfs, then create a fresh table out of it, drop the actual table and rename fresh table to actual table *Proposal* Create a tool/command to read the data from the corrupted ACID table to recover data out of it before we make any changes to the underlying data. So that we can workaround the problem by creating another table with same data -- This message was sent by Atlassian Jira (v8.20.10#820010)