I see that there is already a request to add wildcard support to the SQLContext.parquetFile function https://issues.apache.org/jira/browse/SPARK-3928.
What seems like a useful thing for our use case is to associate the directory structure with certain columns in the table, but it does not seem like this is supported. For example we want to create parquet files on a daily basis associated with geographic regions and so will create a set of files under directories such as: * 2014-12-29/Americas * 2014-12-29/Asia * 2014-12-30/Americas * ... Where queries have predicates that match the column values determinable from directory structure it would be good to only extract data from matching files. Does anyone know if something like this is supported, or whether this is a reasonable thing to request? Mick -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Mapping-directory-structure-to-columns-in-SparkSQL-tp20880.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org