For the last question, you can trigger GC in JVM from Python by : sc._jvm.System.gc()
On Mon, Feb 16, 2015 at 4:08 PM, Antony Mayi <antonym...@yahoo.com.invalid> wrote: > thanks, that looks promissing but can't find any reference giving me more > details - can you please point me to something? Also is it possible to force > GC from pyspark (as I am using pyspark)? > > thanks, > Antony. > > > On Monday, 16 February 2015, 21:05, Tathagata Das > <tathagata.das1...@gmail.com> wrote: > > > > Correct, brute force clean up is not useful. Since Spark 1.0, Spark can do > automatic cleanup of files based on which RDDs are used/garbage collected by > JVM. That would be the best way, but depends on the JVM GC characteristics. > If you force a GC periodically in the driver that might help you get rid of > files in the workers that are not needed. > > TD > > On Mon, Feb 16, 2015 at 12:27 AM, Antony Mayi <antonym...@yahoo.com.invalid> > wrote: > > spark.cleaner.ttl is not the right way - seems to be really designed for > streaming. although it keeps the disk usage under control it also causes > loss of rdds and broadcasts that are required later leading to crash. > > is there any other way? > thanks, > Antony. > > > On Sunday, 15 February 2015, 21:42, Antony Mayi <antonym...@yahoo.com> > wrote: > > > > spark.cleaner.ttl ? > > > On Sunday, 15 February 2015, 18:23, Antony Mayi <antonym...@yahoo.com> > wrote: > > > > Hi, > > I am running bigger ALS on spark 1.2.0 on yarn (cdh 5.3.0) - ALS is using > about 3 billions of ratings and I am doing several trainImplicit() runs in > loop within one spark session. I have four node cluster with 3TB disk space > on each. before starting the job there is less then 8% of the disk space > used. while the ALS is running I can see the disk usage rapidly growing > mainly because of files being stored under > yarn/local/usercache/user/appcache/application_XXX_YYY/spark-local-ZZZ-AAA. > after about 10 hours the disk usage hits 90% and yarn kills the particular > containers. > > am I missing doing some cleanup somewhere while looping over the several > trainImplicit() calls? taking 4*3TB of disk space seems immense. > > thanks for any help, > Antony. > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org