Hello!

String::length will be run on all nodes, but Integer::intValue will be run
locally.

If you want it to be smarter than that, you could use MapReduce & ForkJoin:
https://apacheignite.readme.io/docs/compute-tasks

Regards,
-- 
Ilya Kasnacheev


вт, 30 окт. 2018 г. в 22:02, gsaxena888 <[email protected]>:

> I'm new to Apache Ignite, but a long-time user of jdk8 streams. (And I've
> used Google Cloud Dataflow.) I'm trying to understand the example described
> in latest doc:
> https://apacheignite.readme.io/docs/distributed-closures#apply-methods
> <https://apacheignite.readme.io/docs/distributed-closures#apply-methods>
> and comparing it to jdk8 stream functionality.
>
> Here's the code snippet from the Ignite doc:
> ```
> IgniteCompute compute  = ignite.compute();
>
> // Execute closure on all cluster nodes.
> Collection<Integer> res = compute.apply(
>     String::length,
>     Arrays.asList("How many characters".split(" "))
> );
>
> // Add all the word lengths received from cluster nodes.
> int total = res.stream().mapToInt(Integer::intValue).sum();
> ```
>
> My question is: does the "res.stream.mapToInt()" *also* supposed to run on
> *all* nodes in the cluster or does it only run only on the local node? (The
> doc clealy specifies that the "compute.apply" runs on all nodes, but is
> silent on the res.stream() portion.) I'm assuming that Ignite works
> similarily to Infinispan, which I *think* will run any streaming logic
> (even
> ones with custom collectors) on *all* nodes (see
>
> https://blog.infinispan.org/2018/01/improving-collect-for-distributed-java.html
> <
> https://blog.infinispan.org/2018/01/improving-collect-for-distributed-java.html>
>
> )
>
> (Also, assuming the res.stream.mapToInt... logic runs on all nodes, is it
> reasonable to assume that *within* each node it will run in parallel, and
> that the number of threads *within* each node is equal to the number of
> cores for that node?)
>
> Also, if nodes differ in number of cores (or in current utilization), does
> Ingnite take that into when distributing the work? If so, I'm assuming in
> the above example that it makes that decision *not* in the res.stream()
> part
> but instead in the *initial* "compute.apply" section, and that it then
> tries
> to maintain the data partitioning?
>
> Finally, assuming that res.stream.mapToInt() does run on all nodes, would
> my
> existing jdk8 custom collectors (that follow the jdk8 streaming api for
> collectors) also *automatically* work on all nodes or would I need to do
> some minor refactoring as implied by the Infinispan doc (by using
> Infinispan's "SerializableSupplier<Collector&lt;T, ?, R>>." instead of just
> "Supplier<...>"? (In particular, is Ignite smart enough to first do the
> collection *within* a node, and then start collecting across nodes? And,
> although this is not that important for my needs, is Ignite even smart
> enough to do the collection in a "logical" order between nodes, ie collect
> first within a node, then collect between "close" nodes, then do the final
> collection between the furthest nodes (eg on different racks and then next
> even from different data centers)? (This way, we minimize data transfer
> etc?))
>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Reply via email to