Spark sc.textFile() files with more partitions Vs files with less partitions

Gokula Krishnan D Thu, 20 Jul 2017 05:47:20 -0700

Hello All,

our Spark Applications are designed to process the HDFS Files (Hive
External Tables).


Recently modified the Hive file size by setting the following parameters to
ensure that files are having with the average size of 512MB.
set hive.merge.mapfiles=true
set hive.merge.mapredfiles=true
set hive.merge.smallfiles.avgsize=536870912 (512MB)

Now, I do see the difference in the sc.textFile(HDFS File).count()

Apparently the time has increased drastically since its reading with the
less partitions.

*Is it always better to read any file in Spark with more no.partitions?.
Based on this planning to revert the Hive settings. *


Thanks & Regards,
Gokula Krishnan* (Gokul)*

Spark sc.textFile() files with more partitions Vs files with less partitions

Reply via email to