[jira] [Commented] (SPARK-16317) Add file filtering interface for FileFormat
[ https://issues.apache.org/jira/browse/SPARK-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480127#comment-16480127 ] Xiao Li commented on SPARK-16317: - We will not improve FileFormat since we are migrating the implementation to the data source v2 > Add file filtering interface for FileFormat > --- > > Key: SPARK-16317 > URL: https://issues.apache.org/jira/browse/SPARK-16317 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Priority: Minor > > {{FileFormat}} data sources like Parquet and Avro (provided by spark-avro) > have customized file filtering logics. For example, Parquet needs to filter > out summary files, while Avro provides a Hadoop configuration option to > filter out all files whose names don't end with ".avro". > It would be nice to have a general file filtering interface in {{FileFormat}} > to handle similar requirements. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16317) Add file filtering interface for FileFormat
[ https://issues.apache.org/jira/browse/SPARK-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360779#comment-15360779 ] Apache Spark commented on SPARK-16317: -- User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/14038 > Add file filtering interface for FileFormat > --- > > Key: SPARK-16317 > URL: https://issues.apache.org/jira/browse/SPARK-16317 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Priority: Minor > > {{FileFormat}} data sources like Parquet and Avro (provided by spark-avro) > have customized file filtering logics. For example, Parquet needs to filter > out summary files, while Avro provides a Hadoop configuration option to > filter out all files whose names don't end with ".avro". > It would be nice to have a general file filtering interface in {{FileFormat}} > to handle similar requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16317) Add file filtering interface for FileFormat
[ https://issues.apache.org/jira/browse/SPARK-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359051#comment-15359051 ] Takeshi Yamamuro commented on SPARK-16317: -- yea, I got you. I'll check the codes to fix this. > Add file filtering interface for FileFormat > --- > > Key: SPARK-16317 > URL: https://issues.apache.org/jira/browse/SPARK-16317 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Priority: Minor > > {{FileFormat}} data sources like Parquet and Avro (provided by spark-avro) > have customized file filtering logics. For example, Parquet needs to filter > out summary files, while Avro provides a Hadoop configuration option to > filter out all files whose names don't end with ".avro". > It would be nice to have a general file filtering interface in {{FileFormat}} > to handle similar requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16317) Add file filtering interface for FileFormat
[ https://issues.apache.org/jira/browse/SPARK-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359050#comment-15359050 ] Takeshi Yamamuro commented on SPARK-16317: -- yea, I got you. I'll check the codes to fix this. > Add file filtering interface for FileFormat > --- > > Key: SPARK-16317 > URL: https://issues.apache.org/jira/browse/SPARK-16317 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Priority: Minor > > {{FileFormat}} data sources like Parquet and Avro (provided by spark-avro) > have customized file filtering logics. For example, Parquet needs to filter > out summary files, while Avro provides a Hadoop configuration option to > filter out all files whose names don't end with ".avro". > It would be nice to have a general file filtering interface in {{FileFormat}} > to handle similar requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16317) Add file filtering interface for FileFormat
[ https://issues.apache.org/jira/browse/SPARK-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358975#comment-15358975 ] Cheng Lian commented on SPARK-16317: The motivation is to filter out input data files so that {{buildReader}} won't receive unwanted files. Doing filtering while inferring schema doesn't help. > Add file filtering interface for FileFormat > --- > > Key: SPARK-16317 > URL: https://issues.apache.org/jira/browse/SPARK-16317 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Priority: Minor > > {{FileFormat}} data sources like Parquet and Avro (provided by spark-avro) > have customized file filtering logics. For example, Parquet needs to filter > out summary files, while Avro provides a Hadoop configuration option to > filter out all files whose names don't end with ".avro". > It would be nice to have a general file filtering interface in {{FileFormat}} > to handle similar requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16317) Add file filtering interface for FileFormat
[ https://issues.apache.org/jira/browse/SPARK-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358235#comment-15358235 ] Takeshi Yamamuro commented on SPARK-16317: -- Does this intend a hadoop PathFilter-like interface? How about adding codes below in DataSource#inferFileFormatSchemal? {code} val passFilter = format.getPassFilter format.inferSchema( sparkSession, caseInsensitiveOptions, fileCatalog.allFiles(passFilter)) {code} > Add file filtering interface for FileFormat > --- > > Key: SPARK-16317 > URL: https://issues.apache.org/jira/browse/SPARK-16317 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Cheng Lian >Priority: Minor > > {{FileFormat}} data sources like Parquet and Avro (provided by spark-avro) > have customized file filtering logics. For example, Parquet needs to filter > out summary files, while Avro provides a Hadoop configuration option to > filter out all files whose names don't end with ".avro". > It would be nice to have a general file filtering interface in {{FileFormat}} > to handle similar requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16317) Add file filtering interface for FileFormat
[ https://issues.apache.org/jira/browse/SPARK-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356605#comment-15356605 ] Sean Owen commented on SPARK-16317: --- The JDK already provides FilenameFilter; probably just use that. > Add file filtering interface for FileFormat > --- > > Key: SPARK-16317 > URL: https://issues.apache.org/jira/browse/SPARK-16317 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Cheng Lian > > {{FileFormat}} data sources like Parquet and Avro (provided by spark-avro) > have customized file filtering logics. For example, Parquet needs to filter > out summary files, while Avro provides a Hadoop configuration option to > filter out all files whose names don't end with ".avro". > It would be nice to have a general file filtering interface in {{FileFormat}} > to handle similar requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org