And also https://spark.apache.org/docs/1.6.0/programming-guide.html
If the file is single file, then this would not be distributed. On 26 Apr 2016 11:52 p.m., "Ted Yu" <yuzhih...@gmail.com> wrote: > Please take a look at: > core/src/main/scala/org/apache/spark/SparkContext.scala > > * Do `val rdd = sparkContext.wholeTextFile("hdfs://a-hdfs-path")`, > * > * <p> then `rdd` contains > * {{{ > * (a-hdfs-path/part-00000, its content) > * (a-hdfs-path/part-00001, its content) > * ... > * (a-hdfs-path/part-nnnnn, its content) > * }}} > ... > * @param minPartitions A suggestion value of the minimal splitting > number for input data. > > def wholeTextFiles( > path: String, > minPartitions: Int = defaultMinPartitions): RDD[(String, String)] = > withScope { > > On Tue, Apr 26, 2016 at 7:43 AM, Vadim Vararu <vadim.var...@adswizz.com> > wrote: > >> Hi guys, >> >> I'm trying to read many filed from s3 using >> JavaSparkContext.wholeTextFiles(...). Is that executed in a distributed >> manner? Please give me a link to the place in documentation where it's >> specified. >> >> Thanks, Vadim. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >