Dataframe in single partition after sorting?

Cesar Flores Thu, 02 Jul 2015 07:46:34 -0700

I am sorting a data frame using something like:

val sortedDF = df.orderBy(df("score").desc)


The sorting is really fast. The issue I have is that after sorting, the
resulting data frame sortedDF appears to be in a single partition, which is
a problem because when I try to execute another operation in this new data
frame (i.e sortedDF.limit(1000000)) I have an error like the following:

Job aborted due to stage failure: Total size of serialized results of 194
tasks (5.0 GB) is bigger than spark.driver.maxResultSize (5.0 GB)

I have already tried to repartition the resulting sortedDF before doing any
operation on it, but the same error appears.

*Is there any smarter way to use dataframe orderBy on Spark, such that I do
not have this problem?*


The current version of spark I am using is 1.3.0, and due to company policy
it is not possible for me to try it in a newer version.



Thanks!!!
-- 
Cesar Flores

Dataframe in single partition after sorting?

Reply via email to