Hocheol Park created HIVE-22413:
-----------------------------------

             Summary: Avoid dirty read when reading the ACID table while 
compaction is running
                 Key: HIVE-22413
                 URL: https://issues.apache.org/jira/browse/HIVE-22413
             Project: Hive
          Issue Type: Bug
          Components: Transactions
            Reporter: Hocheol Park


There is a problem that dirty read occurs when reading the ACID table while 
base or delta directories are being created by the compactor. Especially it is 
highly likely to occur in the S3 storage because the “move” logic of S3 is 
“copy and delete”, and it takes a long time to copy if the size of files are 
large or bucketing count is large.

So here’s the logic to avoid this problem. If “_tmp” prefixed directories are 
existed in the partition directory on the process of listing the child 
directories when reading the ACID table, compare the names of the directory in 
the “_tmp” one and skip it in case of the same. Then it will read the files 
before merging, no difference on the results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to