Hocheol Park created HIVE-22413: ----------------------------------- Summary: Avoid dirty read when reading the ACID table while compaction is running Key: HIVE-22413 URL: https://issues.apache.org/jira/browse/HIVE-22413 Project: Hive Issue Type: Bug Components: Transactions Reporter: Hocheol Park
There is a problem that dirty read occurs when reading the ACID table while base or delta directories are being created by the compactor. Especially it is highly likely to occur in the S3 storage because the “move” logic of S3 is “copy and delete”, and it takes a long time to copy if the size of files are large or bucketing count is large. So here’s the logic to avoid this problem. If “_tmp” prefixed directories are existed in the partition directory on the process of listing the child directories when reading the ACID table, compare the names of the directory in the “_tmp” one and skip it in case of the same. Then it will read the files before merging, no difference on the results. -- This message was sent by Atlassian Jira (v8.3.4#803005)