With spark-1.0.0 this is the "cmdline" from /proc/#pid: (with the export line 
"export _JAVA_OPTIONS="...")

/usr/java/jdk1.8.0_05/bin/java-cp::/home/spark2013/spark-1.0.0/conf:/home/spark2013/spark-1.0.0/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/home/spark2013/spark-1.0.0/lib/datanucleus-core-3.2.2.jar:/home/spark2013/spark-1.0.0/lib/datanucleus-rdbms-3.2.1.jar:/home/spark2013/spark-1.0.0/lib/datanucleus-api-jdo-3.2.1.jar-XX:MaxPermSize=128m-Djava.library.path=-Xms512m-Xmx512morg.apache.spark.deploy.SparkSubmit--classSparkKMeans--verbose--masterlocal[24]/home/spark2013/KMeansWorkingDirectory/target/scala-2.10/sparkkmeans_2.10-1.0.jar/home/spark2013/sparkRun/fisier_16mil_30D_R10k.txt10240.001


This is the "cmdline" from /proc/#pid with spark-0.8.0 and launching KMeans 
with "scala "-J-Xms16g -J-Xms16g" ....". The export line from bashrc is ignored 
here also (If I do launch without specifying the java options after the scala 
command , the heap will have the default value) - the results below are from 
launching it with the java options specified after the scala command:

/usr/java/jdk1.7.0_51/bin/java-Xmx256M-Xms32M-Xms16g-Xmx16g-Xbootclasspath/a:/home/spark2013/scala-2.9.3/lib/jline.jar:/home/spark2013/scala-2.9.3/lib/scalacheck.jar:/home/spark2013/scala-2.9.3/lib/scala-compiler.jar:/home/spark2013/scala-2.9.3/lib/scala-dbc.jar:/home/spark2013/scala-2.9.3/lib/scala-library.jar:/home/spark2013/scala-2.9.3/lib/scala-partest.jar:/home/spark2013/scala-2.9.3/lib/scalap.jar:/home/spark2013/scala-2.9.3/lib/scala-swing.jar-Dscala.usejavacp=true-Dscala.home=/home/spark2013/scala-2.9.3-Denv.emacs=scala.tools.nsc.MainGenericRunner-J-Xms16g-J-Xmx16g-cp/home/spark2013/Runs/KMeans/GC/classesSparkKMeanslocal[24]/home/spark2013/sparkRun/fisier_16mil_30D_R10k.txt10240.001


Launching spark-1.0.0 with spark-submit and --driver-memory-10g gets picked up, 
but the results in the execution are the same, a lot of alocation failures
/usr/java/jdk1.8.0_05/bin/java-cp::/home/spark2013/spark-1.0.0/conf:/home/spark2013/spark-1.0.0/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/home/spark2013/spark-1.0.0/lib/datanucleus-core-3.2.2.jar:/home/spark2013/spark-1.0.0/lib/datanucleus-rdbms-3.2.1.jar:/home/spark2013/spark-1.0.0/lib/datanucleus-api-jdo-3.2.1.jar-XX:MaxPermSize=128m-Djava.library.path=-Xms10g-Xmx10gorg.apache.spark.deploy.SparkSubmit--driver-memory10g--classSparkKMeans--masterlocal[24]/home/spark2013/KMeansWorkingDirectory/target/scala-2.10/sparkkmeans_2.10-1.0.jar/home/spark2013/sparkRun/fisier_16mil_30D_R10k.txt10240.001


Adding --executor-memory 11g will not change the outcome:
cat /proc/13286/cmdline
/usr/java/jdk1.8.0_05/bin/java-cp::/home/spark2013/spark-1.0.0/conf:/home/spark2013/spark-1.0.0/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/home/spark2013/spark-1.0.0/lib/datanucleus-core-3.2.2.jar:/home/spark2013/spark-1.0.0/lib/datanucleus-rdbms-3.2.1.jar:/home/spark2013/spark-1.0.0/lib/datanucleus-api-jdo-3.2.1.jar-XX:MaxPermSize=128m-Djava.library.path=-Xms10g-Xmx10gorg.apache.spark.deploy.SparkSubmit--driver-memory10g--executor-memory11g--classSparkKMeans--masterlocal[24]/home/spark2013/KMeansWorkingDirectory/target/scala-2.10/sparkkmeans_2.10-1.0.jar/home/spark2013/sparkRun/fisier_16mil_30D_R10k.txt10240.001

So the Xmx and Xms can be altered, but the execution is rubbish in performance 
compared to spark 0.8.0. How can I improve it ?


Thanks
On Wednesday, July 2, 2014 9:34 PM, Matei Zaharia <matei.zaha...@gmail.com> 
wrote:
 


Try looking at the running processes with “ps” to see their full command line 
and see whether any options are different. It seems like in both cases, your 
young generation is quite large (11 GB), which doesn’t make lot of sense with a 
heap of 15 GB. But maybe I’m misreading something.

Matei

On Jul 2, 2014, at 4:50 AM, Wanda Hawk <wanda_haw...@yahoo.com> wrote:

I ran SparkKMeans with a big file (~ 7 GB of data) for one iteration with 
spark-0.8.0 with this line in bash.rc " export _JAVA_OPTIONS="-Xmx15g -Xms15g 
-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails" ". It finished in a 
decent time, ~50 seconds, and I had only a few "Full GC...." messages from 
Java. (a max of 4-5)
>
>
>Now, using the same export in bash.rc but with spark-1.0.0  (and running it 
>with spark-submit) the first loop never finishes and  I get a lot of:
>"18.537: [GC (Allocation Failure) --[PSYoungGen: 
>11796992K->11796992K(13762560K)] 11797442K->11797450K(13763072K), 2.8420311 
>secs] [Times: user=5.81 sys=2.12, real=2.85 secs]
>"
>or 
>
>
> "31.867: [Full GC (Ergonomics) [PSYoungGen: 11796992K->3177967K(13762560K)] 
>[ParOldGen: 505K->505K(512K)] 11797497K->3178473K(13763072K), [Metaspace: 
>37646K->37646K(1081344K)], 2.3053283 secs] [Times: user=37.74 sys=0.11, 
>real=2.31 secs]"
> 
>I tried passing different parameters for the JVM through spark-submit, but the 
>results are the same
>This happens with java 1.7 and also with java 1.8.
>I do not know what the "Ergonomics" stands for ...
>
>
>How can I get a decent performance from spark-1.0.0 considering that 
>spark-0.8.0 did not need any fine tuning on the gargage collection method (the 
>default worked well) ?
>
>
>Thank you

Reply via email to