Hi,
After taking a look at the code, I found out the problem:
As spark will use broadcastNestedLoopJoin to treat nonequality condition.
And one of my dataframe(df1) is created from an existing RDD(logicalRDD),
so it uses defaultSizeInBytes * length to estimate the size. The other
dataframe(df2) th
Hi,
I am sorry to bother again.
When I do join as follow:
df = sqlContext.sql("selet a.someItem, b.someItem from a full outer join b
on condition1 *or* condition2")
df.first()
The program failed at the result size is bigger than
spark.driver.maxResultSize.
It is really strange, as one record is n