Re: Get size of rdd in memory

2015-02-02 Thread Cheng Lian
It's already fixed in the master branch. Sorry that we forgot to update this before releasing 1.2.0 and caused you trouble... Cheng On 2/2/15 2:03 PM, ankits wrote: Great, thank you very much. I was confused because this is in the docs:

Re: Get size of rdd in memory

2015-02-02 Thread Cheng Lian
Actually |SchemaRDD.cache()| behaves exactly the same as |cacheTable| since Spark 1.2.0. The reason why your web UI didn’t show you the cached table is that both |cacheTable| and |sql(SELECT ...)| are lazy :-) Simply add a |.collect()| after the |sql(...)| call. Cheng On 2/2/15 12:23 PM,

Re: Get size of rdd in memory

2015-02-02 Thread ankits
Great, thank you very much. I was confused because this is in the docs: https://spark.apache.org/docs/1.2.0/sql-programming-guide.html, and on the branch-1.2 branch, https://github.com/apache/spark/blob/branch-1.2/docs/sql-programming-guide.md Note that if you call schemaRDD.cache() rather than

Re: Get size of rdd in memory

2015-02-02 Thread ankits
Thanks for your response. So AFAICT calling parallelize(1 to1024).map(i =KV(i, i.toString)).toSchemaRDD.cache().count(), will allow me to see the size of the schemardd in memory and parallelize(1 to1024).map(i =KV(i, i.toString)).cache().count() will show me the size of a regular rdd. But

Re: Get size of rdd in memory

2015-01-30 Thread Cheng Lian
Here is a toy |spark-shell| session snippet that can show the memory consumption difference: |import org.apache.spark.sql.SQLContext import sc._ val sqlContext = new SQLContext(sc) import sqlContext._ setConf(spark.sql.shuffle.partitions,1) case class KV(key:Int, value:String)