Hi Sasha, Thanks for your reply!
Maybe Shuffle node is enough for data distribution. How about aggregation node? For example, mean kernel with group will maintain sum and count value for each group. Mater node merging mean result needs sum and count value on each partition. But mean kernel seems not support method to export such sum and count value, also, mean kernel doesn't support method to load these sum and count value for additional merge. If it is feasible to provide export/load method on aggregation kernel? Any other tips would be appreciated. Thanks, Jiangtao From: Sasha Krassovsky <[email protected]> Date: Friday, July 7, 2023 at 10:12 AM To: [email protected] <[email protected]> Subject: Re: [C++][Acero] can Acero support distributed computation? Hi Jiangtao, Acero doesn’t support any distributed computation on its own. However, to get some simple distributed computation going it would be sufficient to add a Shuffle node. For example for Aggregation, the Shuffle would assign a range of hashes to each node, and then each node would hash-partition its batches locally and send each partition to be aggregated on the corresponding nodes. You’d also need a master node to merge the results afterwards. In general the Shuffle-by-hash scheme works well for relational queries where order doesn’t matter, but the time series functionality (i.e. as-of-join) wouldn’t work as well. Hope this helps! Sasha Krassovsky 6 июля 2023 г., в 19:04, Jiangtao Peng <[email protected]> написал(а): Hi there, I'm learning Acero streaming execution engine recently. And I’m wondering if Acero support distributed computing. I have read code about aggregation node and kernel; Aggregation kernel seems to hide the details of aggregation middle state. If use multiple nodes with Acero execution engine, how to split aggregation tasks? If current execution engine does not support distributed computing, taking aggregation as an example, how would you plan to transform the aggregate kernel to support distributed computation? Any help or tips would be appreciated. Thanks, Jiangtao
