[Spark-Core] sc.textFile() explicit minPartitions did not work

Gokula Krishnan D Tue, 25 Jul 2017 05:15:37 -0700

Hello All,

I have a HDFS file with approx. *1.5 Billion records* with 500 Part files
(258.2GB Size) and when I tried to execute the following I could see that
it used 2290 tasks but it supposed to be 500 as like HDFS File, isn't it?


val inputFile = <HDFS File>
val inputRdd = sc.textFile(inputFile)
inputRdd.count()

I was hoping that I can do the same with the fewer partitions so tried the
following

val inputFile = <HDFS File>
val inputrddnqew = sc.textFile(inputFile,500)
inputRddNew.count()

But still it used 2290 tasks.

As per scala doc, it supposed use as like the HDFS file i.e 500.

It would be great if you could throw some insight on this.

Thanks & Regards,
Gokula Krishnan* (Gokul)*

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[Spark-Core] sc.textFile() explicit minPartitions did not work

Reply via email to