That’s due to the config setting spark.sql.shuffle.partitions which defaults to 
200


From: Masf
Date: Thursday, May 28, 2015 at 10:02 AM
To: "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: Dataframe Partitioning

Hi.

I have 2 dataframe with 1 and 12 partitions respectively. When I do a inner 
join between these dataframes, the result contains 200 partitions. Why?

df1.join(df2, df1("id") === df2("id"), "Inner") => returns 200 partitions


Thanks!!!
--


Regards.
Miguel Ángel

Reply via email to