A very simple example of adding a new operator to Spark SQL:
https://github.com/apache/spark/pull/1366
An example of adding a new type of join to Spark SQL:
https://github.com/apache/spark/pull/837

Basically, you will need to add a new physical operator that inherits from
SparkPlan and a Strategy that causes the query planner to select it.  Maybe
you can explain a little more what you mean by region-join?  If its only a
different algorithm, and not a logically different type of join, then you
will not need to make some of he logical modifications that the second PR
did.

Often the hardest part here is going to be figuring out when to use one
join over another.  Right now the rules are pretty straightforward: The
joins that are picked first are the most efficient but only handle certain
cases (inner joins with equality predicates).  When that is not the case it
falls back on slower, but more general operators.  If there are more subtle
trade offs involved then we may need to wait until we have more statistics
to help us make the choice.

I'd suggest opening a JIRA and proposing a design before going too far.

Michael


On Sat, Jul 26, 2014 at 3:32 AM, Christos Kozanitis <kozani...@berkeley.edu>
wrote:

> Hello
>
> I was wondering is it easy for you guys to point me to what modules I need
> to update if I had to add extra functionality to sparkSQL?
>
> I was thinking to implement a region-join operator and I guess I should
> add the implementation details under joins.scala but what else do I need to
> modify?
>
> thanks
> Christos
>

Reply via email to