[jira] [Commented] (HIVE-22413) Avoid dirty read when reading the ACID table while compaction is running

2019-10-30 Thread Abhishek Somani (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962817#comment-16962817
 ] 

Abhishek Somani commented on HIVE-22413:


[~pvary] an issue with HIVE-20823 is that it is in 4.0.0(master) only. 
Backporting it to Hive 2/Hive 3 is not feasible as it is a major design change. 
I think we need an interim solution for S3/other blobstores in older Hive 
versions. 

We solved this in a different way ourselves. At the end of compaction, we 
insert a \_compaction_done file in the compacted directory, and the readers 
have been modified (in getAcidState()) to ignore base/delta directories till 
this file is visible. 

> Avoid dirty read when reading the ACID table while compaction is running
> 
>
> Key: HIVE-22413
> URL: https://issues.apache.org/jira/browse/HIVE-22413
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Hocheol Park
>Priority: Major
> Attachments: HIVE-22413.1.patch
>
>
> There is a problem that dirty read occurs when reading the ACID table while 
> base or delta directories are being created by the compactor. Especially it 
> is highly likely to occur in the S3 storage because the “move” logic of S3 is 
> “copy and delete”, and it takes a long time to copy if the size of files are 
> large or bucketing count is large.
> So here’s the logic to avoid this problem. If “_tmp” prefixed directories are 
> existed in the partition directory on the process of listing the child 
> directories when reading the ACID table, compare the names of the directory 
> in the “_tmp” one and skip it in case of the same. Then it will read the 
> files before merging, no difference on the results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22413) Avoid dirty read when reading the ACID table while compaction is running

2019-10-29 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962236#comment-16962236
 ] 

Peter Vary commented on HIVE-22413:
---

This kind of problem supposed to be solved by HIVE-20823
Running compaction in transaction will prevent dirty reads of the new folders

> Avoid dirty read when reading the ACID table while compaction is running
> 
>
> Key: HIVE-22413
> URL: https://issues.apache.org/jira/browse/HIVE-22413
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Hocheol Park
>Priority: Major
> Attachments: HIVE-22413.1.patch
>
>
> There is a problem that dirty read occurs when reading the ACID table while 
> base or delta directories are being created by the compactor. Especially it 
> is highly likely to occur in the S3 storage because the “move” logic of S3 is 
> “copy and delete”, and it takes a long time to copy if the size of files are 
> large or bucketing count is large.
> So here’s the logic to avoid this problem. If “_tmp” prefixed directories are 
> existed in the partition directory on the process of listing the child 
> directories when reading the ACID table, compare the names of the directory 
> in the “_tmp” one and skip it in case of the same. Then it will read the 
> files before merging, no difference on the results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)