wait rdd operations should infact execute in parallel right? so if I call rdd.forEachAsync that should execute in parallel isn't it? I guess I am a little confused what the difference really is between forEachAsync vs forEachPartitionAsync? besides passing in Tuple vs Iterator of Tuples to the lambda respectively.
On Sun, Apr 2, 2017 at 8:36 PM, kant kodali <kanth...@gmail.com> wrote: > Hi all, > > What is the difference between forEachAsync vs forEachPartitionAsync? I > couldn't find any comments from the Javadoc. If I were to guess here is > what I would say but please correct me if I am wrong. > > forEachAsync just iterate through values from all partitions one by one in > an Async Manner > > forEachPartitionAsync: Fan out each partition and run the lambda for each > partition in parallel across different workers. The lambda here will > Iterate through values from that partition one by one in Async manner > > Is this right? or am I completely wrong? > > Thanks! >