Hi Jiangtao, Acero doesn’t support any distributed computation on its own. However, to get some simple distributed computation going it would be sufficient to add a Shuffle node. For example for Aggregation, the Shuffle would assign a range of hashes to each node, and then each node would hash-partition its batches locally and send each partition to be aggregated on the corresponding nodes. You’d also need a master node to merge the results afterwards.
In general the Shuffle-by-hash scheme works well for relational queries where order doesn’t matter, but the time series functionality (i.e. as-of-join) wouldn’t work as well. Hope this helps! Sasha Krassovsky > 6 июля 2023 г., в 19:04, Jiangtao Peng <[email protected]> написал(а): > > > Hi there, > > I'm learning Acero streaming execution engine recently. And I’m wondering if > Acero support distributed computing. > > I have read code about aggregation node and kernel; Aggregation kernel seems > to hide the details of aggregation middle state. If use multiple nodes with > Acero execution engine, how to split aggregation tasks? > > If current execution engine does not support distributed computing, taking > aggregation as an example, how would you plan to transform the aggregate > kernel to support distributed computation? > > Any help or tips would be appreciated. > > Thanks, > Jiangtao
