Yes that is my understanding of how it should work. But in my case when I call collect first time, it reads the data from files on the disk. Subsequent collect queries are not reading data files ( Verified from the logs.) On spark ui I see only shuffle read and no shuffle write.
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Joined-RDD-tp18820p18829.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
