What are the limitations of using Accumulators to get a union of a bunch of small sets?
Let's say I have an RDD[Map{String,Any} and i want to do: rdd.map(accumulator += Set(_.get("entityType").get)) What implication does this have on performance? I'm assuming it's not immediately aggregating each time I call the += on the Accumulator. Is it doing a local combine and then occasionally sending the results on the current partition back to the driver?