That’s due to the config setting spark.sql.shuffle.partitions which defaults to 200
From: Masf Date: Thursday, May 28, 2015 at 10:02 AM To: "user@spark.apache.org<mailto:user@spark.apache.org>" Subject: Dataframe Partitioning Hi. I have 2 dataframe with 1 and 12 partitions respectively. When I do a inner join between these dataframes, the result contains 200 partitions. Why? df1.join(df2, df1("id") === df2("id"), "Inner") => returns 200 partitions Thanks!!! -- Regards. Miguel Ángel