Re: Proposal for SQL join optimization

2015-11-12 Thread Zhan Zhang
Hi Xiao, Performance-wise, without the manual tuning, the query cannot be finished, and with the tuning the query can finish in minutes in TPCH 100G data. I have created https://issues.apache.org/jira/browse/SPARK-11704 and https://issues.apache.org/jira/browse/SPARK-11705 for these two

Re: Proposal for SQL join optimization

2015-11-11 Thread Xiao Li
Hi, Zhan, That sounds really interesting! Please at me when you submit the PR. If possible, please also posted the performance difference. Thanks, Xiao Li 2015-11-11 14:45 GMT-08:00 Zhan Zhang : > Hi Folks, > > I did some performance measurement based on TPC-H

Proposal for SQL join optimization

2015-11-11 Thread Zhan Zhang
Hi Folks, I did some performance measurement based on TPC-H recently, and want to bring up some performance issue I observed. Both are related to cartesian join. 1. CartesianProduct implementation. Currently CartesianProduct relies on RDD.cartesian, in which the computation is realized as