Hi For my use case, I need to call a third party function(which is in memory based) for each complete partition data. So I am partitioning RDD logically using repartition on index column and applying function f on mapPartitions(f).
When, I iterate through mapPartition iterator. Can, I assume one task will only processes one particular partition's complete data(assuming this is small in size)? Or to achieve this, do I need to use glom() on repartition? instead of mapPartitions? And when exactly, I should use preservesPartitioning=true on mapPartitions? Thanks & Regards, B Anil Kumar.