Re: Is JavaSparkContext.wholeTextFiles distributed?

Hyukjin Kwon Tue, 26 Apr 2016 08:00:48 -0700

And also https://spark.apache.org/docs/1.6.0/programming-guide.html


If the file is single file, then this would not be distributed.
On 26 Apr 2016 11:52 p.m., "Ted Yu" <yuzhih...@gmail.com> wrote:

> Please take a look at:
> core/src/main/scala/org/apache/spark/SparkContext.scala
>
>    * Do `val rdd = sparkContext.wholeTextFile("hdfs://a-hdfs-path")`,
>    *
>    * <p> then `rdd` contains
>    * {{{
>    *   (a-hdfs-path/part-00000, its content)
>    *   (a-hdfs-path/part-00001, its content)
>    *   ...
>    *   (a-hdfs-path/part-nnnnn, its content)
>    * }}}
> ...
>   * @param minPartitions A suggestion value of the minimal splitting
> number for input data.
>
>   def wholeTextFiles(
>       path: String,
>       minPartitions: Int = defaultMinPartitions): RDD[(String, String)] =
> withScope {
>
> On Tue, Apr 26, 2016 at 7:43 AM, Vadim Vararu <vadim.var...@adswizz.com>
> wrote:
>
>> Hi guys,
>>
>> I'm trying to read many filed from s3 using
>> JavaSparkContext.wholeTextFiles(...). Is that executed in a distributed
>> manner? Please give me a link to the place in documentation where it's
>> specified.
>>
>> Thanks, Vadim.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: Is JavaSparkContext.wholeTextFiles distributed?

Reply via email to