subject:"Re\: Questions about SparkSQL join on not equality conditions"

Re: Questions about SparkSQL join on not equality conditions

2015-08-11 Thread gen tang

Hi, After taking a look at the code, I found out the problem: As spark will use broadcastNestedLoopJoin to treat nonequality condition. And one of my dataframe(df1) is created from an existing RDD(logicalRDD), so it uses defaultSizeInBytes * length to estimate the size. The other dataframe(df2) th

Re: Questions about SparkSQL join on not equality conditions

2015-08-10 Thread gen tang

Hi, I am sorry to bother again. When I do join as follow: df = sqlContext.sql("selet a.someItem, b.someItem from a full outer join b on condition1 *or* condition2") df.first() The program failed at the result size is bigger than spark.driver.maxResultSize. It is really strange, as one record is n