I'm new to Apache Ignite, but a long-time user of jdk8 streams. (And I've
used Google Cloud Dataflow.) I'm trying to understand the example described
in latest doc: 
https://apacheignite.readme.io/docs/distributed-closures#apply-methods
<https://apacheignite.readme.io/docs/distributed-closures#apply-methods>  
and comparing it to jdk8 stream functionality.

Here's the code snippet from the Ignite doc:
```
IgniteCompute compute  = ignite.compute();

// Execute closure on all cluster nodes.
Collection<Integer> res = compute.apply(
    String::length,
    Arrays.asList("How many characters".split(" "))
);
     
// Add all the word lengths received from cluster nodes.
int total = res.stream().mapToInt(Integer::intValue).sum(); 
```

My question is: does the "res.stream.mapToInt()" *also* supposed to run on
*all* nodes in the cluster or does it only run only on the local node? (The
doc clealy specifies that the "compute.apply" runs on all nodes, but is
silent on the res.stream() portion.) I'm assuming that Ignite works
similarily to Infinispan, which I *think* will run any streaming logic (even
ones with custom collectors) on *all* nodes (see 
https://blog.infinispan.org/2018/01/improving-collect-for-distributed-java.html
<https://blog.infinispan.org/2018/01/improving-collect-for-distributed-java.html>
 
)

(Also, assuming the res.stream.mapToInt... logic runs on all nodes, is it
reasonable to assume that *within* each node it will run in parallel, and
that the number of threads *within* each node is equal to the number of
cores for that node?)

Also, if nodes differ in number of cores (or in current utilization), does
Ingnite take that into when distributing the work? If so, I'm assuming in
the above example that it makes that decision *not* in the res.stream() part
but instead in the *initial* "compute.apply" section, and that it then tries
to maintain the data partitioning?

Finally, assuming that res.stream.mapToInt() does run on all nodes, would my
existing jdk8 custom collectors (that follow the jdk8 streaming api for
collectors) also *automatically* work on all nodes or would I need to do
some minor refactoring as implied by the Infinispan doc (by using
Infinispan's "SerializableSupplier<Collector&lt;T, ?, R>>." instead of just
"Supplier<...>"? (In particular, is Ignite smart enough to first do the
collection *within* a node, and then start collecting across nodes? And,
although this is not that important for my needs, is Ignite even smart
enough to do the collection in a "logical" order between nodes, ie collect
first within a node, then collect between "close" nodes, then do the final
collection between the furthest nodes (eg on different racks and then next
even from different data centers)? (This way, we minimize data transfer
etc?))






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to