shahabm wrote > I noticed that rdd.cache() is not happening immediately rather due to lazy > feature of Spark, it is happening just at the moment you perform some > map/reduce actions. Is this true?
Yes, .cache() is a transformation (lazy evaluation) shahabm wrote > If this is the case, how can I enforce Spark to cache immediately at its > cache() statement? I need this to perform some benchmarking and I need to > separate rdd caching and rdd transformation/action processing time. put an action immediately after .cache() .cache().first() may be low impact, as it only returns the first element of the RDD, rather than iterating. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-enforce-RDD-to-be-cached-tp20230p20284.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org