Re: setting heap space

2014-10-13 Thread Akhil Das
Like this: import org.apache.spark.storage.StorageLevel val rdd = sc.parallelize(1 to 100).persist(StorageLevel.MEMORY_AND_DISK_SER) Thanks Best Regards On Mon, Oct 13, 2014 at 12:50 PM, Chengi Liu wrote: > Cool.. Thanks.. And one last final question.. > conf = SparkConf.set().set(...)

Re: setting heap space

2014-10-13 Thread Chengi Liu
Cool.. Thanks.. And one last final question.. conf = SparkConf.set().set(...) matrix = get_data(..) rdd = sc.parallelize(matrix) # heap error here... How and where do I set set the storage level.. seems like conf is the wrong place to set this thing up..?? as I get this error: py4j.protocol.Py4

Re: setting heap space

2014-10-13 Thread Akhil Das
like this you can set: sparkConf.set("spark.executor.extraJavaOptions", " -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+AggressiveOpts -XX:FreqInlineSize=300 -XX:MaxInlineSize=300 ") Here's a benchmark example

Re: setting heap space

2014-10-13 Thread Chengi Liu
Hi Akhil, Thanks for the response.. Another query... do you know how to use "spark.executor.extraJavaOptions" option? SparkConf.set("spark.executor.extraJavaOptions","what value should go in here")? I am trying to find an example but cannot seem to find the same.. On Mon, Oct 13, 2014 at 12:03

Re: setting heap space

2014-10-13 Thread Akhil Das
Few things to keep in mind: - I believe Driver memory should not exceed executor memory - Set spark.storage.memoryFraction default is 0.6 - Set spark.rdd.compress default is set to false - Always specify the level of parallelism while doing a groupBy, reduceBy, join, sortBy etc. - If you don't have

setting heap space

2014-10-12 Thread Chengi Liu
Hi, I am trying to use spark but I am having hard time configuring the sparkconf... My current conf is conf = SparkConf().set("spark.executor.memory","10g").set("spark.akka.frameSize", "1").set("spark.driver.memory","16g") but I still see the java heap size error 14/10/12 09:54:50 ERROR