I think you mean that data2 is a function of data1 in the first
example. I imagine that the second version is a little bit more
efficient.

But it is nothing to do with memory or caching. You don't have to
cache anything here if you don't want to. You can cache what you like.
Once memory for the cache fills up, some partitions will be dropped
from the cache. Obviously, if your cache is full of RDD partitions
that you don't need, that's wasting space that could be used for
caching data you need. It's a good idea to unpersist RDDs than no
longer need to be cached, of course.

If you don't need intermediate RDD data1, then certainly don't cache
it, but its existence doesn't do much.

On Mon, Oct 6, 2014 at 9:56 PM, anny9699 <[email protected]> wrote:
> Hi,
>
> I see that this type of question has been asked before, however still a
> little confused about it in practice. Such as there are two ways I could
> deal with a series of RDD transformation before I do a RDD action, which way
> is faster:
>
> Way 1:
> val data = sc.textFile()
> val data1 = data.map(x => f1(x))
> val data2 = data.map(x1 = f2(x1))
> println(data2.count())
>
> Way2:
> val data = sc.textFile(0
> val data2 = data.map(x => f2(f1(x)))
> println(data2.count())
>
> Since Spark doesn't materialize RDD transformations, so I assume the two
> ways are equal?
>
> I asked this because the memory of my cluster is very limited and I don't
> want to cache a RDD at the very early stage. Is it true that if I cache a
> RDD early and take the space, then I need to unpersist it before I cache
> another in order to save the memory?
>
> Thanks a lot!
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/lazy-evaluation-of-RDD-transformation-tp15811.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to