Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-10-08 Thread Liquan Pei
ail.com ] > *Sent:* 2014年9月30日 18:34 > *To:* Haopu Wang > *Cc:* dev@spark.apache.org; user > *Subject:* Re: Spark SQL question: why build hashtable for both sides in > HashOuterJoin? > > Hi Haopu, > > How about full outer join? One hash table may not be efficient for this > case.

Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-10-08 Thread Matei Zaharia
ng > Cc: dev@spark.apache.org; user > Subject: Re: Spark SQL question: why build hashtable for both sides in > HashOuterJoin? > > Hi Haopu, > > How about full outer join? One hash table may not be efficient for this case. > > Liquan > > On Mon, Sep 29, 201

RE: Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-10-07 Thread Haopu Wang
@spark.apache.org; user Subject: Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin? Hi Haopu, How about full outer join? One hash table may not be efficient for this case. Liquan On Mon, Sep 29, 2014 at 11:47 PM, Haopu Wang wrote: Hi, Liquan, thanks for

Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-09-30 Thread Liquan Pei
nt:* 2014年9月30日 12:31 > *To:* Haopu Wang > *Cc:* dev@spark.apache.org; user > *Subject:* Re: Spark SQL question: why build hashtable for both sides in > HashOuterJoin? > > > > Hi Haopu, > > > > My understanding is that the hashtable on both left and right side is used >

RE: Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-09-29 Thread Haopu Wang
anks again! From: Liquan Pei [mailto:liquan...@gmail.com] Sent: 2014年9月30日 12:31 To: Haopu Wang Cc: dev@spark.apache.org; user Subject: Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin? Hi Haopu, My understanding is that the hashtable on both left and

Re: Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-09-29 Thread Liquan Pei
Hi Haopu, My understanding is that the hashtable on both left and right side is used for including null values in result in an efficient manner. If hash table is only built on one side, let's say left side and we perform a left outer join, for each row in left side, a scan over the right side is n

Spark SQL question: why build hashtable for both sides in HashOuterJoin?

2014-09-29 Thread Haopu Wang
I take a look at HashOuterJoin and it's building a Hashtable for both sides. This consumes quite a lot of memory when the partition is big. And it doesn't reduce the iteration on streamed relation, right? Thanks! - To unsubscrib