[jira] [Commented] (SPARK-16317) Add file filtering interface for FileFormat

2018-05-17 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480127#comment-16480127
 ] 

Xiao Li commented on SPARK-16317:
-

We will not improve FileFormat since we are migrating the implementation to the 
data source v2

> Add file filtering interface for FileFormat
> ---
>
> Key: SPARK-16317
> URL: https://issues.apache.org/jira/browse/SPARK-16317
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Priority: Minor
>
> {{FileFormat}} data sources like Parquet and Avro (provided by spark-avro) 
> have customized file filtering logics. For example, Parquet needs to filter 
> out summary files, while Avro provides a Hadoop configuration option to 
> filter out all files whose names don't end with ".avro".
> It would be nice to have a general file filtering interface in {{FileFormat}} 
> to handle similar requirements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16317) Add file filtering interface for FileFormat

2016-07-03 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15360779#comment-15360779
 ] 

Apache Spark commented on SPARK-16317:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/14038

> Add file filtering interface for FileFormat
> ---
>
> Key: SPARK-16317
> URL: https://issues.apache.org/jira/browse/SPARK-16317
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Priority: Minor
>
> {{FileFormat}} data sources like Parquet and Avro (provided by spark-avro) 
> have customized file filtering logics. For example, Parquet needs to filter 
> out summary files, while Avro provides a Hadoop configuration option to 
> filter out all files whose names don't end with ".avro".
> It would be nice to have a general file filtering interface in {{FileFormat}} 
> to handle similar requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16317) Add file filtering interface for FileFormat

2016-07-01 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359051#comment-15359051
 ] 

Takeshi Yamamuro commented on SPARK-16317:
--

yea, I got you. I'll check the codes to fix this.

> Add file filtering interface for FileFormat
> ---
>
> Key: SPARK-16317
> URL: https://issues.apache.org/jira/browse/SPARK-16317
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Priority: Minor
>
> {{FileFormat}} data sources like Parquet and Avro (provided by spark-avro) 
> have customized file filtering logics. For example, Parquet needs to filter 
> out summary files, while Avro provides a Hadoop configuration option to 
> filter out all files whose names don't end with ".avro".
> It would be nice to have a general file filtering interface in {{FileFormat}} 
> to handle similar requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16317) Add file filtering interface for FileFormat

2016-07-01 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359050#comment-15359050
 ] 

Takeshi Yamamuro commented on SPARK-16317:
--

yea, I got you. I'll check the codes to fix this.

> Add file filtering interface for FileFormat
> ---
>
> Key: SPARK-16317
> URL: https://issues.apache.org/jira/browse/SPARK-16317
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Priority: Minor
>
> {{FileFormat}} data sources like Parquet and Avro (provided by spark-avro) 
> have customized file filtering logics. For example, Parquet needs to filter 
> out summary files, while Avro provides a Hadoop configuration option to 
> filter out all files whose names don't end with ".avro".
> It would be nice to have a general file filtering interface in {{FileFormat}} 
> to handle similar requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16317) Add file filtering interface for FileFormat

2016-07-01 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358975#comment-15358975
 ] 

Cheng Lian commented on SPARK-16317:


The motivation is to filter out input data files so that {{buildReader}} won't 
receive unwanted files. Doing filtering while inferring schema doesn't help.

> Add file filtering interface for FileFormat
> ---
>
> Key: SPARK-16317
> URL: https://issues.apache.org/jira/browse/SPARK-16317
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Priority: Minor
>
> {{FileFormat}} data sources like Parquet and Avro (provided by spark-avro) 
> have customized file filtering logics. For example, Parquet needs to filter 
> out summary files, while Avro provides a Hadoop configuration option to 
> filter out all files whose names don't end with ".avro".
> It would be nice to have a general file filtering interface in {{FileFormat}} 
> to handle similar requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16317) Add file filtering interface for FileFormat

2016-06-30 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15358235#comment-15358235
 ] 

Takeshi Yamamuro commented on SPARK-16317:
--

Does this intend a hadoop PathFilter-like interface?
How about adding codes below in DataSource#inferFileFormatSchemal?
{code}
val passFilter = format.getPassFilter
format.inferSchema(
  sparkSession,
  caseInsensitiveOptions,
  fileCatalog.allFiles(passFilter))
{code}

> Add file filtering interface for FileFormat
> ---
>
> Key: SPARK-16317
> URL: https://issues.apache.org/jira/browse/SPARK-16317
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Priority: Minor
>
> {{FileFormat}} data sources like Parquet and Avro (provided by spark-avro) 
> have customized file filtering logics. For example, Parquet needs to filter 
> out summary files, while Avro provides a Hadoop configuration option to 
> filter out all files whose names don't end with ".avro".
> It would be nice to have a general file filtering interface in {{FileFormat}} 
> to handle similar requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16317) Add file filtering interface for FileFormat

2016-06-30 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356605#comment-15356605
 ] 

Sean Owen commented on SPARK-16317:
---

The JDK already provides FilenameFilter; probably just use that.

> Add file filtering interface for FileFormat
> ---
>
> Key: SPARK-16317
> URL: https://issues.apache.org/jira/browse/SPARK-16317
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>
> {{FileFormat}} data sources like Parquet and Avro (provided by spark-avro) 
> have customized file filtering logics. For example, Parquet needs to filter 
> out summary files, while Avro provides a Hadoop configuration option to 
> filter out all files whose names don't end with ".avro".
> It would be nice to have a general file filtering interface in {{FileFormat}} 
> to handle similar requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org