[jira] [Updated] (IMPALA-8663) FileMetadataLoader should skip listing files in hidden and tmp directories
[ https://issues.apache.org/jira/browse/IMPALA-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated IMPALA-8663: Labels: catalog-v2 impala-acid (was: impala-acid) > FileMetadataLoader should skip listing files in hidden and tmp directories > -- > > Key: IMPALA-8663 > URL: https://issues.apache.org/jira/browse/IMPALA-8663 > Project: IMPALA > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Critical > Labels: catalog-v2, impala-acid > > Currently, the file metadata loader recursively lists the table and partition > directories to get the fileStatuses. For each filestatus we ignore the hidden > files in {{FileSystemUtil.isValidDataFile}}(). However that is not > sufficient. For instance, if Hive is inserting data into a table when the > refresh is called, it is possible the staging directory is present within the > table directory. This staging directory is a hidden directory of the naming > {{.hive-staging_*}}. It is possible that this directory has files which are > not hidden (starting from a . or _). Such files should be considered > temporary files and should not be considered as valid data files. > > Another instance where we see this happen is in transactional tables which > has a {{.manifest}} which is located in a {{_tmp}} directory within the table > directory. This file should also be skipped and not considered as a valid > data file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8663) FileMetadataLoader should skip listing files in hidden and tmp directories
[ https://issues.apache.org/jira/browse/IMPALA-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Garg updated IMPALA-8663: Labels: impala-acid (was: ) > FileMetadataLoader should skip listing files in hidden and tmp directories > -- > > Key: IMPALA-8663 > URL: https://issues.apache.org/jira/browse/IMPALA-8663 > Project: IMPALA > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > Labels: impala-acid > > Currently, the file metadata loader recursively lists the table and partition > directories to get the fileStatuses. For each filestatus we ignore the hidden > files in {{FileSystemUtil.isValidDataFile}}(). However that is not > sufficient. For instance, if Hive is inserting data into a table when the > refresh is called, it is possible the staging directory is present within the > table directory. This staging directory is a hidden directory of the naming > {{.hive-staging_*}}. It is possible that this directory has files which are > not hidden (starting from a . or _). Such files should be considered > temporary files and should not be considered as valid data files. > > Another instance where we see this happen is in transactional tables which > has a {{.manifest}} which is located in a {{_tmp}} directory within the table > directory. This file should also be skipped and not considered as a valid > data file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8663) FileMetadataLoader should skip listing files in hidden and tmp directories
[ https://issues.apache.org/jira/browse/IMPALA-8663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Garg updated IMPALA-8663: Priority: Critical (was: Major) > FileMetadataLoader should skip listing files in hidden and tmp directories > -- > > Key: IMPALA-8663 > URL: https://issues.apache.org/jira/browse/IMPALA-8663 > Project: IMPALA > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Critical > Labels: impala-acid > > Currently, the file metadata loader recursively lists the table and partition > directories to get the fileStatuses. For each filestatus we ignore the hidden > files in {{FileSystemUtil.isValidDataFile}}(). However that is not > sufficient. For instance, if Hive is inserting data into a table when the > refresh is called, it is possible the staging directory is present within the > table directory. This staging directory is a hidden directory of the naming > {{.hive-staging_*}}. It is possible that this directory has files which are > not hidden (starting from a . or _). Such files should be considered > temporary files and should not be considered as valid data files. > > Another instance where we see this happen is in transactional tables which > has a {{.manifest}} which is located in a {{_tmp}} directory within the table > directory. This file should also be skipped and not considered as a valid > data file. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org