Re: [VOTE] Release Apache Spark 1.2.1 (RC2)

2015-01-30 Thread Marcelo Vanzin
+1 (non-binding) Ran spark-shell and Scala jobs on top of yarn (using the hadoop-2.4 tarball). There's a very slight behavioral change in the API. This code now throws an NPE: new SparkConf().setIfMissing(foo, null) It worked before. It's probably fine, though, since `SparkConf.set` would

Get size of rdd in memory

2015-01-30 Thread ankits
Hi, I want to benchmark the memory savings by using the in-memory columnar storage for schemardds (using cacheTable) vs caching the SchemaRDD directly. It would be really helpful to be able to query this from the spark-shell or jobs directly. Could a dev point me to the way to do this? From what

Re: Get size of rdd in memory

2015-01-30 Thread Cheng Lian
Here is a toy |spark-shell| session snippet that can show the memory consumption difference: |import org.apache.spark.sql.SQLContext import sc._ val sqlContext = new SQLContext(sc) import sqlContext._ setConf(spark.sql.shuffle.partitions,1) case class KV(key:Int, value:String)