Re: long GC pause during file.cache()

Wei Tan Mon, 16 Jun 2014 07:48:11 -0700

Thanks you all for advice including (1) using CMS GC (2) use multiple 
worker instance and (3) use Tachyon.


I will try (1) and (2) first and report back what I found.

I will also try JDK 7 with G1 GC.

Best regards,
Wei

---------------------------------
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan



From:   Aaron Davidson <ilike...@gmail.com>
To:     user@spark.apache.org, 
Date:   06/15/2014 09:06 PM
Subject:        Re: long GC pause during file.cache()



Note also that Java does not work well with very large JVMs due to this 
exact issue. There are two commonly used workarounds:

1) Spawn multiple (smaller) executors on the same machine. This can be 
done by creating multiple Workers (via SPARK_WORKER_INSTANCES in 
standalone mode[1]).
2) Use Tachyon for off-heap caching of RDDs, allowing Spark executors to 
be smaller and avoid GC pauses

[1] See standalone documentation here: 
http://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts


On Sun, Jun 15, 2014 at 3:50 PM, Nan Zhu <zhunanmcg...@gmail.com> wrote:
Yes, I think in the spark-env.sh.template, it is listed in the comments 
(didn’t check….) 

Best,

-- 
Nan Zhu

On Sunday, June 15, 2014 at 5:21 PM, Surendranauth Hiraman wrote:
Is SPARK_DAEMON_JAVA_OPTS valid in 1.0.0?



On Sun, Jun 15, 2014 at 4:59 PM, Nan Zhu <zhunanmcg...@gmail.com> wrote:
SPARK_JAVA_OPTS is deprecated in 1.0, though it works fine if you 
don’t mind the WARNING in the logs

you can set spark.executor.extraJavaOpts in your SparkConf obj

Best,

-- 
Nan Zhu

On Sunday, June 15, 2014 at 12:13 PM, Hao Wang wrote:
Hi, Wei

You may try to set JVM opts in spark-env.sh as follow to prevent or 
mitigate GC pause:

export SPARK_JAVA_OPTS="-XX:-UseGCOverheadLimit -XX:+UseConcMarkSweepGC 
-Xmx2g -XX:MaxPermSize=256m"

There are more options you could add, please just Google :) 


Regards,
Wang Hao(王灏)

CloudTeam | School of Software Engineering
Shanghai Jiao Tong University
Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
Email:wh.s...@gmail.com


On Sun, Jun 15, 2014 at 10:24 AM, Wei Tan <w...@us.ibm.com> wrote:
Hi, 

  I have a single node (192G RAM) stand-alone spark, with memory 
configuration like this in spark-env.sh 

SPARK_WORKER_MEMORY=180g 
SPARK_MEM=180g 


 In spark-shell I have a program like this: 

val file = sc.textFile("/localpath") //file size is 40G 
file.cache() 


val output = file.map(line => extract something from line) 

output.saveAsTextFile (...) 


When I run this program again and again, or keep trying file.unpersist() 
--> file.cache() --> output.saveAsTextFile(), the run time varies a lot, 
from 1 min to 3 min to 50+ min. Whenever the run-time is more than 1 min, 
from the stage monitoring GUI I observe big GC pause (some can be 10+ 
min). Of course when run-time is "normal", say ~1 min, no significant GC 
is observed. The behavior seems somewhat random. 

Is there any JVM tuning I should do to prevent this long GC pause from 
happening? 



I used java-1.6.0-openjdk.x86_64, and my spark-shell process is something 
like this: 

root     10994  1.7  0.6 196378000 1361496 pts/51 Sl+ 22:06   0:12 
/usr/lib/jvm/java-1.6.0-openjdk.x86_64/bin/java -cp 
::/home/wtan/scala/spark-1.0.0-bin-hadoop1/conf:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/datanucleus-core-3.2.2.jar:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/datanucleus-rdbms-3.2.1.jar:/home/wtan/scala/spark-1.0.0-bin-hadoop1/lib/datanucleus-api-jdo-3.2.1.jar
 
-XX:MaxPermSize=128m -Djava.library.path= -Xms180g -Xmx180g 
org.apache.spark.deploy.SparkSubmit spark-shell --class 
org.apache.spark.repl.Main 

Best regards, 
Wei 

--------------------------------- 
Wei Tan, PhD 
Research Staff Member 
IBM T. J. Watson Research Center 
http://researcher.ibm.com/person/us-wtan





-- 
                                                            
SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: (917) 525-2466 ext. 105
F: 646.349.4063
E: suren.hira...@velos.io
W: www.velos.io

Re: long GC pause during file.cache()

Reply via email to