Andy Grove created ARROW-11606:
----------------------------------

             Summary: [Rust] [DataFusion] Need guidance on HashAggregateExec 
reconstruction
                 Key: ARROW-11606
                 URL: https://issues.apache.org/jira/browse/ARROW-11606
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Rust - DataFusion
            Reporter: Andy Grove


We have run into an issue in the Ballista project where we are reconstructing 
the Final and Partial HashAggregateExec operators [1] for distributed execution 
and we need some guidance.

The Partial HashAggregateExec gets created OK and executes correctly.

However, when we create the Final HashAggregateExec, it is not finding the 
expected schema in the input operator. The partial exec outputs field names 
ending with "[sum]" and "[count]" and so on but the final aggregate doesn't 
seem to be looking for those names.

It is also worth noting that the Final and Partial executors are not connected 
directly in this usage.

The Partial exec is executed and output streamed to disk.

The Final exec then runs against the output from the Partial exec.

We may need to make changes in DataFusion to allow other crates to support this 
kind of use case?

 [1] https://github.com/ballista-compute/ballista/pull/491

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to