Hi, I was just thinking about necessity for rdd replication. One category could be something like large number of threads requiring same rdd. Even though, a single rdd can be shared by multiple threads belonging to "same application" , I believe we can extract better parallelism if the rdd is replicated, am I right?.
I am eager to know if there are any real life applications or any other scenarios which force rdd to be replicated. Can someone please throw some light on "necessity for rdd replication". Thank you