I wonder if that’s the problem. Is there an equivalent hadoop fs -ls command you can run that returns the same files you want but doesn’t have that month= string?
On Wed, Jun 18, 2014 at 12:25 PM, Jianshi Huang <jianshi.hu...@gmail.com> wrote: > Hi Nicholas, > > month= is for Hive to auto discover the partitions. It's part of the url > of my files. > > Jianshi > > > On Wed, Jun 18, 2014 at 11:52 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Is that month= syntax something special, or do your files actually have >> that string as part of their name? >> >> >> >> On Wed, Jun 18, 2014 at 2:25 AM, Jianshi Huang <jianshi.hu...@gmail.com> >> wrote: >> >>> Hi all, >>> >>> Thanks for the reply. I'm using parquetFile as input, is that a problem? >>> In hadoop fs -ls, the path (hdfs://domain/user/ >>> jianshuang/data/parquet/table/month=2014*) will get list all the files. >>> >>> I'll test it again. >>> >>> Jianshi >>> >>> >>> On Wed, Jun 18, 2014 at 2:23 PM, Jianshi Huang <jianshi.hu...@gmail.com> >>> wrote: >>> >>>> Hi Andrew, >>>> >>>> Strangely in my spark (1.0.0 compiled against hadoop 2.4.0) log, it >>>> says file not found. I'll try again. >>>> >>>> Jianshi >>>> >>>> >>>> On Wed, Jun 18, 2014 at 12:36 PM, Andrew Ash <and...@andrewash.com> >>>> wrote: >>>> >>>>> In Spark you can use the normal globs supported by Hadoop's >>>>> FileSystem, which are documented here: >>>>> http://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path) >>>>> >>>>> >>>>> On Wed, Jun 18, 2014 at 12:09 AM, MEETHU MATHEW < >>>>> meethu2...@yahoo.co.in> wrote: >>>>> >>>>>> Hi Jianshi, >>>>>> >>>>>> I have used wild card characters (*) in my program and it worked.. >>>>>> My code was like this >>>>>> b = sc.textFile("hdfs:///path to file/data_file_2013SEP01*") >>>>>> >>>>>> Thanks & Regards, >>>>>> Meethu M >>>>>> >>>>>> >>>>>> On Wednesday, 18 June 2014 9:29 AM, Jianshi Huang < >>>>>> jianshi.hu...@gmail.com> wrote: >>>>>> >>>>>> >>>>>> It would be convenient if Spark's textFile, parquetFile, etc. can >>>>>> support path with wildcard, such as: >>>>>> >>>>>> hdfs://domain/user/jianshuang/data/parquet/table/month=2014* >>>>>> >>>>>> Or is there already a way to do it now? >>>>>> >>>>>> Jianshi >>>>>> >>>>>> -- >>>>>> Jianshi Huang >>>>>> >>>>>> LinkedIn: jianshi >>>>>> Twitter: @jshuang >>>>>> Github & Blog: http://huangjs.github.com/ >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Jianshi Huang >>>> >>>> LinkedIn: jianshi >>>> Twitter: @jshuang >>>> Github & Blog: http://huangjs.github.com/ >>>> >>> >>> >>> >>> -- >>> Jianshi Huang >>> >>> LinkedIn: jianshi >>> Twitter: @jshuang >>> Github & Blog: http://huangjs.github.com/ >>> >> >> > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ >