Hi Pradeep

≥≥≥ Looks like what I was suggesting doesn't work. :/
I guess you mean put comma separated path into one string and pass it
to existing API (SparkContext#textFile). It should not work. I suggest to
create new api SparkContext#textFiles to accept an array of string. I have
already implemented a simple patch and it works.




On Thu, Nov 12, 2015 at 10:17 AM, Pradeep Gollakota <pradeep...@gmail.com>
wrote:

> Looks like what I was suggesting doesn't work. :/
>
> On Wed, Nov 11, 2015 at 4:49 PM, Jeff Zhang <zjf...@gmail.com> wrote:
>
>> Yes, that's what I suggest. TextInputFormat support multiple inputs. So
>> in spark side, we just need to provide API to for that.
>>
>> On Thu, Nov 12, 2015 at 8:45 AM, Pradeep Gollakota <pradeep...@gmail.com>
>> wrote:
>>
>>> IIRC, TextInputFormat supports an input path that is a comma separated
>>> list. I haven't tried this, but I think you should just be able to do
>>> sc.textFile("file1,file2,...")
>>>
>>> On Wed, Nov 11, 2015 at 4:30 PM, Jeff Zhang <zjf...@gmail.com> wrote:
>>>
>>>> I know these workaround, but wouldn't it be more convenient and
>>>> straightforward to use SparkContext#textFiles ?
>>>>
>>>> On Thu, Nov 12, 2015 at 2:27 AM, Mark Hamstra <m...@clearstorydata.com>
>>>> wrote:
>>>>
>>>>> For more than a small number of files, you'd be better off using
>>>>> SparkContext#union instead of RDD#union.  That will avoid building up a
>>>>> lengthy lineage.
>>>>>
>>>>> On Wed, Nov 11, 2015 at 10:21 AM, Jakob Odersky <joder...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hey Jeff,
>>>>>> Do you mean reading from multiple text files? In that case, as a
>>>>>> workaround, you can use the RDD#union() (or ++) method to concatenate
>>>>>> multiple rdds. For example:
>>>>>>
>>>>>> val lines1 = sc.textFile("file1")
>>>>>> val lines2 = sc.textFile("file2")
>>>>>>
>>>>>> val rdd = lines1 union lines2
>>>>>>
>>>>>> regards,
>>>>>> --Jakob
>>>>>>
>>>>>> On 11 November 2015 at 01:20, Jeff Zhang <zjf...@gmail.com> wrote:
>>>>>>
>>>>>>> Although user can use the hdfs glob syntax to support multiple
>>>>>>> inputs. But sometimes, it is not convenient to do that. Not sure why
>>>>>>> there's no api of SparkContext#textFiles. It should be easy to implement
>>>>>>> that. I'd love to create a ticket and contribute for that if there's no
>>>>>>> other consideration that I don't know.
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards
>>>>>>>
>>>>>>> Jeff Zhang
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>>
>>>
>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>


-- 
Best Regards

Jeff Zhang

Reply via email to