With spark-1.0.0 this is the "cmdline" from /proc/#pid: (with the export line "export _JAVA_OPTIONS="...")
/usr/java/jdk1.8.0_05/bin/java-cp::/home/spark2013/spark-1.0.0/conf:/home/spark2013/spark-1.0.0/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/home/spark2013/spark-1.0.0/lib/datanucleus-core-3.2.2.jar:/home/spark2013/spark-1.0.0/lib/datanucleus-rdbms-3.2.1.jar:/home/spark2013/spark-1.0.0/lib/datanucleus-api-jdo-3.2.1.jar-XX:MaxPermSize=128m-Djava.library.path=-Xms512m-Xmx512morg.apache.spark.deploy.SparkSubmit--classSparkKMeans--verbose--masterlocal[24]/home/spark2013/KMeansWorkingDirectory/target/scala-2.10/sparkkmeans_2.10-1.0.jar/home/spark2013/sparkRun/fisier_16mil_30D_R10k.txt10240.001 This is the "cmdline" from /proc/#pid with spark-0.8.0 and launching KMeans with "scala "-J-Xms16g -J-Xms16g" ....". The export line from bashrc is ignored here also (If I do launch without specifying the java options after the scala command , the heap will have the default value) - the results below are from launching it with the java options specified after the scala command: /usr/java/jdk1.7.0_51/bin/java-Xmx256M-Xms32M-Xms16g-Xmx16g-Xbootclasspath/a:/home/spark2013/scala-2.9.3/lib/jline.jar:/home/spark2013/scala-2.9.3/lib/scalacheck.jar:/home/spark2013/scala-2.9.3/lib/scala-compiler.jar:/home/spark2013/scala-2.9.3/lib/scala-dbc.jar:/home/spark2013/scala-2.9.3/lib/scala-library.jar:/home/spark2013/scala-2.9.3/lib/scala-partest.jar:/home/spark2013/scala-2.9.3/lib/scalap.jar:/home/spark2013/scala-2.9.3/lib/scala-swing.jar-Dscala.usejavacp=true-Dscala.home=/home/spark2013/scala-2.9.3-Denv.emacs=scala.tools.nsc.MainGenericRunner-J-Xms16g-J-Xmx16g-cp/home/spark2013/Runs/KMeans/GC/classesSparkKMeanslocal[24]/home/spark2013/sparkRun/fisier_16mil_30D_R10k.txt10240.001 Launching spark-1.0.0 with spark-submit and --driver-memory-10g gets picked up, but the results in the execution are the same, a lot of alocation failures /usr/java/jdk1.8.0_05/bin/java-cp::/home/spark2013/spark-1.0.0/conf:/home/spark2013/spark-1.0.0/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/home/spark2013/spark-1.0.0/lib/datanucleus-core-3.2.2.jar:/home/spark2013/spark-1.0.0/lib/datanucleus-rdbms-3.2.1.jar:/home/spark2013/spark-1.0.0/lib/datanucleus-api-jdo-3.2.1.jar-XX:MaxPermSize=128m-Djava.library.path=-Xms10g-Xmx10gorg.apache.spark.deploy.SparkSubmit--driver-memory10g--classSparkKMeans--masterlocal[24]/home/spark2013/KMeansWorkingDirectory/target/scala-2.10/sparkkmeans_2.10-1.0.jar/home/spark2013/sparkRun/fisier_16mil_30D_R10k.txt10240.001 Adding --executor-memory 11g will not change the outcome: cat /proc/13286/cmdline /usr/java/jdk1.8.0_05/bin/java-cp::/home/spark2013/spark-1.0.0/conf:/home/spark2013/spark-1.0.0/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/home/spark2013/spark-1.0.0/lib/datanucleus-core-3.2.2.jar:/home/spark2013/spark-1.0.0/lib/datanucleus-rdbms-3.2.1.jar:/home/spark2013/spark-1.0.0/lib/datanucleus-api-jdo-3.2.1.jar-XX:MaxPermSize=128m-Djava.library.path=-Xms10g-Xmx10gorg.apache.spark.deploy.SparkSubmit--driver-memory10g--executor-memory11g--classSparkKMeans--masterlocal[24]/home/spark2013/KMeansWorkingDirectory/target/scala-2.10/sparkkmeans_2.10-1.0.jar/home/spark2013/sparkRun/fisier_16mil_30D_R10k.txt10240.001 So the Xmx and Xms can be altered, but the execution is rubbish in performance compared to spark 0.8.0. How can I improve it ? Thanks On Wednesday, July 2, 2014 9:34 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote: Try looking at the running processes with “ps” to see their full command line and see whether any options are different. It seems like in both cases, your young generation is quite large (11 GB), which doesn’t make lot of sense with a heap of 15 GB. But maybe I’m misreading something. Matei On Jul 2, 2014, at 4:50 AM, Wanda Hawk <wanda_haw...@yahoo.com> wrote: I ran SparkKMeans with a big file (~ 7 GB of data) for one iteration with spark-0.8.0 with this line in bash.rc " export _JAVA_OPTIONS="-Xmx15g -Xms15g -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails" ". It finished in a decent time, ~50 seconds, and I had only a few "Full GC...." messages from Java. (a max of 4-5) > > >Now, using the same export in bash.rc but with spark-1.0.0 (and running it >with spark-submit) the first loop never finishes and I get a lot of: >"18.537: [GC (Allocation Failure) --[PSYoungGen: >11796992K->11796992K(13762560K)] 11797442K->11797450K(13763072K), 2.8420311 >secs] [Times: user=5.81 sys=2.12, real=2.85 secs] >" >or > > > "31.867: [Full GC (Ergonomics) [PSYoungGen: 11796992K->3177967K(13762560K)] >[ParOldGen: 505K->505K(512K)] 11797497K->3178473K(13763072K), [Metaspace: >37646K->37646K(1081344K)], 2.3053283 secs] [Times: user=37.74 sys=0.11, >real=2.31 secs]" > >I tried passing different parameters for the JVM through spark-submit, but the >results are the same >This happens with java 1.7 and also with java 1.8. >I do not know what the "Ergonomics" stands for ... > > >How can I get a decent performance from spark-1.0.0 considering that >spark-0.8.0 did not need any fine tuning on the gargage collection method (the >default worked well) ? > > >Thank you