I wonder if that’s the problem. Is there an equivalent hadoop fs -ls
command you can run that returns the same files you want but doesn’t have
that month= string?
​


On Wed, Jun 18, 2014 at 12:25 PM, Jianshi Huang <jianshi.hu...@gmail.com>
wrote:

> Hi Nicholas,
>
> month= is for Hive to auto discover the partitions. It's part of the url
> of my files.
>
> Jianshi
>
>
> On Wed, Jun 18, 2014 at 11:52 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> Is that month= syntax something special, or do your files actually have
>> that string as part of their name?
>> ​
>>
>>
>> On Wed, Jun 18, 2014 at 2:25 AM, Jianshi Huang <jianshi.hu...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> Thanks for the reply. I'm using parquetFile as input, is that a problem?
>>> In hadoop fs -ls, the path (hdfs://domain/user/
>>> jianshuang/data/parquet/table/month=2014*) will get list all the files.
>>>
>>> I'll test it again.
>>>
>>> Jianshi
>>>
>>>
>>> On Wed, Jun 18, 2014 at 2:23 PM, Jianshi Huang <jianshi.hu...@gmail.com>
>>> wrote:
>>>
>>>> Hi Andrew,
>>>>
>>>> Strangely in my spark (1.0.0 compiled against hadoop 2.4.0) log, it
>>>> says file not found. I'll try again.
>>>>
>>>> Jianshi
>>>>
>>>>
>>>> On Wed, Jun 18, 2014 at 12:36 PM, Andrew Ash <and...@andrewash.com>
>>>> wrote:
>>>>
>>>>> In Spark you can use the normal globs supported by Hadoop's
>>>>> FileSystem, which are documented here:
>>>>> http://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path)
>>>>>
>>>>>
>>>>> On Wed, Jun 18, 2014 at 12:09 AM, MEETHU MATHEW <
>>>>> meethu2...@yahoo.co.in> wrote:
>>>>>
>>>>>> Hi Jianshi,
>>>>>>
>>>>>> I have used wild card characters (*) in my program and it worked..
>>>>>> My code was like this
>>>>>> b = sc.textFile("hdfs:///path to file/data_file_2013SEP01*")
>>>>>>
>>>>>> Thanks & Regards,
>>>>>> Meethu M
>>>>>>
>>>>>>
>>>>>>   On Wednesday, 18 June 2014 9:29 AM, Jianshi Huang <
>>>>>> jianshi.hu...@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>>  It would be convenient if Spark's textFile, parquetFile, etc. can
>>>>>> support path with wildcard, such as:
>>>>>>
>>>>>>   hdfs://domain/user/jianshuang/data/parquet/table/month=2014*
>>>>>>
>>>>>>  Or is there already a way to do it now?
>>>>>>
>>>>>> Jianshi
>>>>>>
>>>>>> --
>>>>>> Jianshi Huang
>>>>>>
>>>>>> LinkedIn: jianshi
>>>>>> Twitter: @jshuang
>>>>>> Github & Blog: http://huangjs.github.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Jianshi Huang
>>>>
>>>> LinkedIn: jianshi
>>>> Twitter: @jshuang
>>>> Github & Blog: http://huangjs.github.com/
>>>>
>>>
>>>
>>>
>>> --
>>> Jianshi Huang
>>>
>>> LinkedIn: jianshi
>>> Twitter: @jshuang
>>> Github & Blog: http://huangjs.github.com/
>>>
>>
>>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Reply via email to