It's already fixed in the master branch. Sorry that we forgot to update
this before releasing 1.2.0 and caused you trouble...
Cheng
On 2/2/15 2:03 PM, ankits wrote:
Great, thank you very much. I was confused because this is in the docs:
Actually |SchemaRDD.cache()| behaves exactly the same as |cacheTable|
since Spark 1.2.0. The reason why your web UI didn’t show you the cached
table is that both |cacheTable| and |sql(SELECT ...)| are lazy :-)
Simply add a |.collect()| after the |sql(...)| call.
Cheng
On 2/2/15 12:23 PM,
Great, thank you very much. I was confused because this is in the docs:
https://spark.apache.org/docs/1.2.0/sql-programming-guide.html, and on the
branch-1.2 branch,
https://github.com/apache/spark/blob/branch-1.2/docs/sql-programming-guide.md
Note that if you call schemaRDD.cache() rather than
Thanks for your response. So AFAICT
calling parallelize(1 to1024).map(i =KV(i,
i.toString)).toSchemaRDD.cache().count(), will allow me to see the size of
the schemardd in memory
and parallelize(1 to1024).map(i =KV(i, i.toString)).cache().count() will
show me the size of a regular rdd.
But
Here is a toy |spark-shell| session snippet that can show the memory
consumption difference:
|import org.apache.spark.sql.SQLContext
import sc._
val sqlContext = new SQLContext(sc)
import sqlContext._
setConf(spark.sql.shuffle.partitions,1)
case class KV(key:Int, value:String)