Hello! String::length will be run on all nodes, but Integer::intValue will be run locally.
If you want it to be smarter than that, you could use MapReduce & ForkJoin: https://apacheignite.readme.io/docs/compute-tasks Regards, -- Ilya Kasnacheev вт, 30 окт. 2018 г. в 22:02, gsaxena888 <[email protected]>: > I'm new to Apache Ignite, but a long-time user of jdk8 streams. (And I've > used Google Cloud Dataflow.) I'm trying to understand the example described > in latest doc: > https://apacheignite.readme.io/docs/distributed-closures#apply-methods > <https://apacheignite.readme.io/docs/distributed-closures#apply-methods> > and comparing it to jdk8 stream functionality. > > Here's the code snippet from the Ignite doc: > ``` > IgniteCompute compute = ignite.compute(); > > // Execute closure on all cluster nodes. > Collection<Integer> res = compute.apply( > String::length, > Arrays.asList("How many characters".split(" ")) > ); > > // Add all the word lengths received from cluster nodes. > int total = res.stream().mapToInt(Integer::intValue).sum(); > ``` > > My question is: does the "res.stream.mapToInt()" *also* supposed to run on > *all* nodes in the cluster or does it only run only on the local node? (The > doc clealy specifies that the "compute.apply" runs on all nodes, but is > silent on the res.stream() portion.) I'm assuming that Ignite works > similarily to Infinispan, which I *think* will run any streaming logic > (even > ones with custom collectors) on *all* nodes (see > > https://blog.infinispan.org/2018/01/improving-collect-for-distributed-java.html > < > https://blog.infinispan.org/2018/01/improving-collect-for-distributed-java.html> > > ) > > (Also, assuming the res.stream.mapToInt... logic runs on all nodes, is it > reasonable to assume that *within* each node it will run in parallel, and > that the number of threads *within* each node is equal to the number of > cores for that node?) > > Also, if nodes differ in number of cores (or in current utilization), does > Ingnite take that into when distributing the work? If so, I'm assuming in > the above example that it makes that decision *not* in the res.stream() > part > but instead in the *initial* "compute.apply" section, and that it then > tries > to maintain the data partitioning? > > Finally, assuming that res.stream.mapToInt() does run on all nodes, would > my > existing jdk8 custom collectors (that follow the jdk8 streaming api for > collectors) also *automatically* work on all nodes or would I need to do > some minor refactoring as implied by the Infinispan doc (by using > Infinispan's "SerializableSupplier<Collector<T, ?, R>>." instead of just > "Supplier<...>"? (In particular, is Ignite smart enough to first do the > collection *within* a node, and then start collecting across nodes? And, > although this is not that important for my needs, is Ignite even smart > enough to do the collection in a "logical" order between nodes, ie collect > first within a node, then collect between "close" nodes, then do the final > collection between the furthest nodes (eg on different racks and then next > even from different data centers)? (This way, we minimize data transfer > etc?)) > > > > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >
