Hi, SparkSQL inside can put order assumptions on columns (OrderedDistribution) though, JDBC datasources does not support this; spark is not sure how columns loaded from databases are ordered. Also, there is no way to let spark know this order.
thanks, On Fri, Feb 26, 2016 at 2:22 PM, Ken Geis <geis....@gmail.com> wrote: > I am loading data from two different databases and joining it in Spark. > The data is indexed in the database, so it is efficient to retrieve the > data ordered by a key. Can I tell Spark that my data is coming in ordered > on that key so that when I join the data sets, they will be joined with > little shuffling via a merge join? > > I know that Flink supports this, but its JDBC support is pretty lacking in > general. > > > Thanks, > > Ken > > -- --- Takeshi Yamamuro