Re: [C++][Acero] can Acero support distributed computation？

Sasha Krassovsky Thu, 06 Jul 2023 19:11:41 -0700

Hi Jiangtao,
Acero doesn’t support any distributed computation on its own. However, to get 
some simple distributed computation going it would be sufficient to add a 
Shuffle node. For example for Aggregation, the Shuffle would assign a range of 
hashes to each node, and then each node would hash-partition its batches 
locally and send each partition to be aggregated on the corresponding nodes. 
You’d also need a master node to merge the results afterwards.


In general the Shuffle-by-hash scheme works well for relational queries where 
order doesn’t matter, but the time series functionality (i.e. as-of-join) 
wouldn’t work as well. 

Hope this helps! 
Sasha Krassovsky 

> 6 июля 2023 г., в 19:04, Jiangtao Peng <[email protected]> написал(а):
> 
> 
> Hi there,
> 
> I'm learning Acero streaming execution engine recently. And I’m wondering if 
> Acero support distributed computing.
>  
> I have read code about aggregation node and kernel; Aggregation kernel seems 
> to hide the details of aggregation middle state. If use multiple nodes with 
> Acero execution engine, how to split aggregation tasks?
>  
> If current execution engine does not support distributed computing, taking 
> aggregation as an example, how would you plan to transform the aggregate 
> kernel to support distributed computation?
>  
> Any help or tips would be appreciated.
> 
> Thanks,
> Jiangtao

Re: [C++][Acero] can Acero support distributed computation？

Reply via email to