[ 
https://issues.apache.org/jira/browse/HIVE-24021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175494#comment-17175494
 ] 

Karen Coppage commented on HIVE-24021:
--------------------------------------

HIVE-24023 is also required for reading Impala-truncated insert-only parquet 
tables.

> Read insert-only tables truncated by Impala correctly
> -----------------------------------------------------
>
>                 Key: HIVE-24021
>                 URL: https://issues.apache.org/jira/browse/HIVE-24021
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Karen Coppage
>            Assignee: Karen Coppage
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Impala truncates insert-only tables by writing a base directory containing an 
> empty file named "_empty". (Like Hive should, see HIVE-20137) Generally in 
> Hive a file name beginning with an underscore connotes a temporary file that 
> isn't supposed to be read by operations that didn't create it.
>  Before HIVE-23495, getAcidState listed each directory in the table 
> (HdfsUtils#listLocatedStatus) – and filtered out directories with names 
> beginning with an underscore or period as they are presumably temporary. This 
> allowed files called "_empty" to be read, since hive checked the directory 
> name and not the file name.
>  After HIVE-23495, we recursively list each file in the table 
> (AcidUtils#getHdfsDirSnapshots) with a filter that doesn't accept files with 
> names beginning with an underscore or period as they are presumably 
> temporary. As a result Hive reads the table data as if the truncate operation 
> had not happened.
> Since performance in getAcidState is important, probably the best solution is 
> make an exception in the filter and accept files with the name "_empty".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to