Hello All,

I have a HDFS file with approx. *1.5 Billion records* with 500 Part files
(258.2GB Size) and when I tried to execute the following I could see that
it used 2290 tasks but it supposed to be 500 as like HDFS File, isn't it?

val inputFile = <HDFS File>
val inputRdd = sc.textFile(inputFile)
inputRdd.count()

I was hoping that I can do the same with the fewer partitions so tried the
following

val inputFile = <HDFS File>
val inputrddnqew = sc.textFile(inputFile,500)
inputRddNew.count()

But still it used 2290 tasks.

As per scala doc, it supposed use as like the HDFS file i.e 500.

It would be great if you could throw some insight on this.

Thanks & Regards,
Gokula Krishnan* (Gokul)*
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to