Re: Dataframe Partitioning

Silvio Fiorito Thu, 28 May 2015 07:13:28 -0700

That’s due to the config setting spark.sql.shuffle.partitions which defaults to 
200

From: Masf
Date: Thursday, May 28, 2015 at 10:02 AM
To: "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: Dataframe Partitioning

Hi.

I have 2 dataframe with 1 and 12 partitions respectively. When I do a inner 
join between these dataframes, the result contains 200 partitions. Why?

df1.join(df2, df1("id") === df2("id"), "Inner") => returns 200 partitions

Thanks!!!
--

Regards.
Miguel Ángel

Re: Dataframe Partitioning

Reply via email to