Re: Sort Merge Join from the filesystem

2015-11-16 Thread Alex Nastetsky
> > > *From:* Alex Nastetsky [mailto:alex.nastet...@vervemobile.com] > *Sent:* Tuesday, November 10, 2015 3:03 AM > *To:* Cheng, Hao > *Cc:* Reynold Xin; dev@spark.apache.org > *Subject:* Re: Sort Merge Join from the filesystem > > > > Thanks for creating that ticket

Re: Sort Merge Join from the filesystem

2015-11-09 Thread Alex Nastetsky
gt; > > > *From:* Reynold Xin [mailto:r...@databricks.com] > *Sent:* Thursday, November 5, 2015 1:36 AM > *To:* Alex Nastetsky > *Cc:* dev@spark.apache.org > *Subject:* Re: Sort Merge Join from the filesystem > > > > It's not supported yet, and not sure if there is a

RE: Sort Merge Join from the filesystem

2015-11-09 Thread Cheng, Hao
: Reynold Xin; dev@spark.apache.org Subject: Re: Sort Merge Join from the filesystem Thanks for creating that ticket. Another thing I was thinking of, is doing this type of join between dataset A which is already partitioned/sorted on disk and dataset B, which gets generated during the run

RE: Sort Merge Join from the filesystem

2015-11-04 Thread Cheng, Hao
:36 AM To: Alex Nastetsky Cc: dev@spark.apache.org Subject: Re: Sort Merge Join from the filesystem It's not supported yet, and not sure if there is a ticket for it. I don't think there is anything fundamentally hard here either. On Wed, Nov 4, 2015 at 6:37 AM, Alex Nastetsky <alex.nas