Andy Grove created ARROW-12253:
----------------------------------

             Summary: [Rust] [Ballista] Implement scalable joins
                 Key: ARROW-12253
                 URL: https://issues.apache.org/jira/browse/ARROW-12253
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Rust - Ballista
            Reporter: Andy Grove
            Assignee: Andy Grove
             Fix For: 5.0.0


The main issue limiting scalability in Ballista today is that joins are 
implemented as hash joins where each partition of the probe side causes the 
entire left side to be loaded into memory.

To make this scalable we need to hash partition left and right inputs so that 
we can join the left and right partitions in parallel.

There is already work underway in DataFusion to implement this that we can 
leverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to