I've been thinking about this some more: I think the ignite solution is nearly perfect, *if* the reduce operation runs within every node (so that, for example, the results of ~96 threads on one google compute enginer were reduced/summarized to a single value) and then either a single final reduction occurrs on one node (eg a leader node?) or, if we want to get fancy, the last reduction can occur in parallel accross the nodes (but it's not clear whether doing a parallel reduction accross al the nodes is a performant in the majority of real-life cases given the overhead of doing so). So the question is: does the "reduce" operation run first within each node? If not, wouldn't it make sense to do so (to minimize transferring data to a single node to do a giant reduction and to also make good use of all the nodes' cores?) And if not, is there an alternative way of achieving this in Ignite (ie what do developers do?)
-- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
