Hello All, our Spark Applications are designed to process the HDFS Files (Hive External Tables).
Recently modified the Hive file size by setting the following parameters to ensure that files are having with the average size of 512MB. set hive.merge.mapfiles=true set hive.merge.mapredfiles=true set hive.merge.smallfiles.avgsize=536870912 (512MB) Now, I do see the difference in the sc.textFile(HDFS File).count() Apparently the time has increased drastically since its reading with the less partitions. *Is it always better to read any file in Spark with more no.partitions?. Based on this planning to revert the Hive settings. * Thanks & Regards, Gokula Krishnan* (Gokul)*