I see that there is already a request to add wildcard support to the
SQLContext.parquetFile function
https://issues.apache.org/jira/browse/SPARK-3928.

What seems like a useful thing for our use case is to associate the
directory structure with certain columns in the table, but it does not seem
like this is supported.

For example we want to create parquet files on a daily basis associated with
geographic regions and so will create a set of files under directories such
as:

* 2014-12-29/Americas
* 2014-12-29/Asia
* 2014-12-30/Americas
* ...

Where queries have predicates that match the column values determinable from
directory structure it would be good to only extract data from matching
files.

Does anyone know if something like this is supported, or whether this is a
reasonable thing to request?

Mick








--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Mapping-directory-structure-to-columns-in-SparkSQL-tp20880.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to