Spark Version: 1.3.1 Cluster: Mesos 0.22.0 Scala Version: 2.10.4 I am seeing work done on my cluster when invoking cache on an rdd. I would have expected the last line of the code below to not invoke any cluster work. Is there some condition where cache will do cluster work?
val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._ // work is done to load the json into the dataframe val people = sc.parallelize( """{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}""" :: Nil ) val peoplDF = sqlContext.jsonRDD(people).toDF() // No work is done for the orderBy, as expected val orderBy = peoplDF.orderBy("name") // Jobs are run when invoking cache, expectation was nothing would run on the cluster val orderByCache = orderBy.cache -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/orderBy-cache-is-invoking-work-on-mesos-cluster-tp23749.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org