Taraka Rama Rao Lethavadla created HIVE-26897:
-------------------------------------------------

             Summary: Provide a command/tool to recover data in ACID table when 
table data got corrupted with invalid/junk delta/delete_delta folders 
                 Key: HIVE-26897
                 URL: https://issues.apache.org/jira/browse/HIVE-26897
             Project: Hive
          Issue Type: New Feature
            Reporter: Taraka Rama Rao Lethavadla


Example: A table has below directories
{noformat}
drwx------ - hive hive 0 2022-11-05 19:43 
/data/warehouse/tbl/delete_delta_0080483_0087704_v0973185
drwx------ - hive pdl_prod_nosh_jsin 0 2022-12-05 00:18 
/data/warehouse/tbl/delete_delta_0080483_0088384_v1111507{noformat}
When we read data from this table, we get below errors
{noformat}
java.util.concurrent.ExecutionException: java.lang.IllegalStateException: 
Duplicate key null (attempted merging values 
org.apache.hadoop.hive.ql.io.AcidInputFormat$DeltaFileMetaData@41776cd9 and 
org.apache.hadoop.hive.ql.io.AcidInputFormat$DeltaFileMetaData@1404a054){noformat}
delete_delta_0080483_0087704_v0973185,delete_delta_0080483_0088384_v1111507 are 
created as part of minor compaction. In general, once minor compaction 
completed, the next minor compaction picks min_writeId value as greater than 
the value of the previously compacted max_writeId. In this case for both the 
minor compacted directories could see min_writeId is the same (i.e. 0080483).

To mitigate the issue, we had to remove those directories manually from hdfs, 
then create a fresh table out of it, drop the actual table and rename fresh 
table to actual table

*Proposal*

Create a tool/command to read the data from the corrupted ACID table to recover 
data out of it before we make any changes to the underlying data. So that we 
can workaround the problem by creating another table with same data



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to