Re: Necessity for rdd replication.

2014-12-04 Thread Sameer Farooqui
In general, most use cases don't need the RDD to be replicated in memory
multiple times. It would be a rare exception to do this. If it's really
expensive (time consuming) to recomputing a lost partition or if the use
case is extremely time sensitive, then maybe you could replicate it in
memory. But in general, you can safely rely on the RDD lineage graph to
re-create the lost partition it it gets discarded from memory.

As far as extracting better parallelism if the RDD is replicated, that
really depends on what sort of transformations and operations you're
running against the RDD, but again.. generally speaking, you shouldn't need
to replicate it.

On Wed, Dec 3, 2014 at 11:54 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:

 Hi,

 I was just thinking about necessity for rdd replication. One category
 could be something like large number of threads requiring same rdd. Even
 though, a single rdd can be shared by multiple threads belonging to same
 application , I believe we can extract better parallelism  if the rdd is
 replicated, am I right?.

 I am eager to know if there are any real life applications or any other
 scenarios which force rdd to be replicated. Can someone please throw some
 light on necessity for rdd replication.

 Thank you




Necessity for rdd replication.

2014-12-03 Thread rapelly kartheek
Hi,

I was just thinking about necessity for rdd replication. One category could
be something like large number of threads requiring same rdd. Even though,
a single rdd can be shared by multiple threads belonging to same
application , I believe we can extract better parallelism  if the rdd is
replicated, am I right?.

I am eager to know if there are any real life applications or any other
scenarios which force rdd to be replicated. Can someone please throw some
light on necessity for rdd replication.

Thank you