mapPartition iterator

AnilKumar B Mon, 16 Jan 2017 14:29:07 -0800

Hi

For my use case, I need to call a third party function(which is in memory
based) for each complete partition data. So I am partitioning RDD logically
using repartition on index column and applying function f  on
mapPartitions(f).


When, I iterate through mapPartition iterator. Can, I assume one task will
only processes one particular partition's complete data(assuming this is
small in size)?

Or to achieve this, do I need to use glom() on repartition? instead of
mapPartitions?

And when exactly, I should use preservesPartitioning=true on mapPartitions?

Thanks & Regards,
B Anil Kumar.

mapPartition iterator

Reply via email to