Re: merge join already sorted data?

Takeshi Yamamuro Thu, 25 Feb 2016 22:58:23 -0800

Hi,

SparkSQL inside can put order assumptions on columns (OrderedDistribution)
though,
JDBC datasources does not support this; spark is not sure how columns
loaded from databases are ordered.
Also, there is no way to let spark know this order.


thanks,



On Fri, Feb 26, 2016 at 2:22 PM, Ken Geis <geis....@gmail.com> wrote:

> I am loading data from two different databases and joining it in Spark.
> The data is indexed in the database, so it is efficient to retrieve the
> data ordered by a key. Can I tell Spark that my data is coming in ordered
> on that key so that when I join the data sets, they will be joined with
> little shuffling via a merge join?
>
> I know that Flink supports this, but its JDBC support is pretty lacking in
> general.
>
>
> Thanks,
>
> Ken
>
>


-- 
---
Takeshi Yamamuro

Re: merge join already sorted data?

Reply via email to