I think you mean that data2 is a function of data1 in the first example. I imagine that the second version is a little bit more efficient.
But it is nothing to do with memory or caching. You don't have to cache anything here if you don't want to. You can cache what you like. Once memory for the cache fills up, some partitions will be dropped from the cache. Obviously, if your cache is full of RDD partitions that you don't need, that's wasting space that could be used for caching data you need. It's a good idea to unpersist RDDs than no longer need to be cached, of course. If you don't need intermediate RDD data1, then certainly don't cache it, but its existence doesn't do much. On Mon, Oct 6, 2014 at 9:56 PM, anny9699 <[email protected]> wrote: > Hi, > > I see that this type of question has been asked before, however still a > little confused about it in practice. Such as there are two ways I could > deal with a series of RDD transformation before I do a RDD action, which way > is faster: > > Way 1: > val data = sc.textFile() > val data1 = data.map(x => f1(x)) > val data2 = data.map(x1 = f2(x1)) > println(data2.count()) > > Way2: > val data = sc.textFile(0 > val data2 = data.map(x => f2(f1(x))) > println(data2.count()) > > Since Spark doesn't materialize RDD transformations, so I assume the two > ways are equal? > > I asked this because the memory of my cluster is very limited and I don't > want to cache a RDD at the very early stage. Is it true that if I cache a > RDD early and take the space, then I need to unpersist it before I cache > another in order to save the memory? > > Thanks a lot! > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/lazy-evaluation-of-RDD-transformation-tp15811.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
