You can also try coalesce as it will avoid full shuffle.
Regards, Tushar Adeshara Technical Specialist – Analytics Practice Cell: +91-81490 04192 Persistent Systems Ltd. | Partners in Innovation | www.persistentsys.com<http://www.persistentsys.com/> ________________________________ From: KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> Sent: 13 October 2017 09:35 To: user @spark Subject: Spark - Partitions Hi, I am reading hive query and wiriting the data back into hive after doing some transformations. I have changed setting spark.sql.shuffle.partitions to 2000 and since then job completes fast but the main problem is I am getting 2000 files for each partition size of file is 10 MB . is there a way to get same performance but write lesser number of files ? I am trying repartition now but would like to know if there are any other options. Thanks, Asmath DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.