orderBy + cache is invoking work on mesos cluster

Corey Stubbs Thu, 09 Jul 2015 09:53:42 -0700

Spark Version: 1.3.1
Cluster: Mesos 0.22.0
Scala Version: 2.10.4

I am seeing work done on my cluster when invoking cache on an rdd. I would
have expected the last line of the code below to not invoke any cluster
work. Is there some condition where cache will do cluster work?



val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
// work is done to load the json into the dataframe
val people = sc.parallelize(
  """{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}""" :: Nil
)
val peoplDF = sqlContext.jsonRDD(people).toDF()
// No work is done for the orderBy, as expected
val orderBy = peoplDF.orderBy("name")
// Jobs are run when invoking cache, expectation was nothing would run on
the cluster
val orderByCache = orderBy.cache




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/orderBy-cache-is-invoking-work-on-mesos-cluster-tp23749.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

orderBy + cache is invoking work on mesos cluster

Reply via email to