[
https://issues.apache.org/jira/browse/HIVE-22413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962817#comment-16962817
]
Abhishek Somani commented on HIVE-22413:
[~pvary] an issue with HIVE-20823 is that it is in 4.0.0(master) only.
Backporting it to Hive 2/Hive 3 is not feasible as it is a major design change.
I think we need an interim solution for S3/other blobstores in older Hive
versions.
We solved this in a different way ourselves. At the end of compaction, we
insert a \_compaction_done file in the compacted directory, and the readers
have been modified (in getAcidState()) to ignore base/delta directories till
this file is visible.
> Avoid dirty read when reading the ACID table while compaction is running
>
>
> Key: HIVE-22413
> URL: https://issues.apache.org/jira/browse/HIVE-22413
> Project: Hive
> Issue Type: Bug
> Components: Transactions
>Reporter: Hocheol Park
>Priority: Major
> Attachments: HIVE-22413.1.patch
>
>
> There is a problem that dirty read occurs when reading the ACID table while
> base or delta directories are being created by the compactor. Especially it
> is highly likely to occur in the S3 storage because the “move” logic of S3 is
> “copy and delete”, and it takes a long time to copy if the size of files are
> large or bucketing count is large.
> So here’s the logic to avoid this problem. If “_tmp” prefixed directories are
> existed in the partition directory on the process of listing the child
> directories when reading the ACID table, compare the names of the directory
> in the “_tmp” one and skip it in case of the same. Then it will read the
> files before merging, no difference on the results.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)