GitHub user westonpace added a comment to the discussion: Does Acero support 
multi-column joins?

This matches my understanding as well.  The API surface for Acero is large and 
making bindings would be a pretty daunting task.  I think Substrait was chosen 
to avoid this.

Note: Substrait itself has no problem expressing multi-column joins (which it 
appears you are aware of since you mention it in your question :smile:).  The 
join relation takes a join expression.  That expression could be `left.a = 
right.a AND left.b = right.b`.

Probably the lowest effort fix would be to adjust Acero's Substrait consumer to 
handle the multiple keys.  The logic is here: 
https://github.com/apache/arrow/blob/520ae44272d491bbb52eb3c9b84864ed7088f11a/cpp/src/arrow/engine/substrait/relation_internal.cc#L715

It's pretty simplistic and expects the expression to be an equals expression.  
However, if there would multiple join keys the expression would be an AND 
expression.  So some better parsing of the expression tree would be needed.

GitHub link: 
https://github.com/apache/arrow/discussions/46212#discussioncomment-12930363

----
This is an automatically sent email for user@arrow.apache.org.
To unsubscribe, please send an email to: user-unsubscr...@arrow.apache.org

Reply via email to