It won't work until this is merged: https://github.com/apache/spark/pull/3407
On Wed, Dec 3, 2014 at 9:25 AM, Yana Kadiyska <yana.kadiy...@gmail.com> wrote: > Hi folks, > > I'm wondering if someone has successfully used wildcards with a > parquetFile call? > > I saw this thread and it makes me think no? > http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3CCACA1tWLjcF-NtXj=pqpqm3xk4aj0jitxjhmdqbojj_ojybo...@mail.gmail.com%3E > > I have a set of parquet files that are partitioned by key. I'd like to > issue a query to read in a subset of the files, based on a directory > wildcard (the wildcard will be a little more specific than * but this is to > show the issue): > > This call works fine: > > sc.textFile("hdfs:///warehouse/hive/*/*/*.parquet").first > res4: String = PAR1????? L??????? ?\??????? ,???????????? > ,????????????????a??aL????????0?x????????U???e?? > > > > but this doesn't > > scala> val parquetFile = > sqlContext.parquetFile(“hdfs:///warehouse/hive/*/*/*.parquet”).first > java.io.FileNotFoundException: File > hdfs://cdh4-14822-nn/warehouse/hive/*/*/*.parquet does not exist > > > > > >