Re: /tmp directory fills up

2015-01-12 Thread Marcelo Vanzin
Hi Alessandro,

You can look for a log line like this in your driver's output:
15/01/12 10:51:01 INFO storage.DiskBlockManager: Created local
directory at 
/data/yarn/nm/usercache/systest/appcache/application_1421081007635_0002/spark-local-20150112105101-4f3d

If you're deploying your application in cluster mode, the temp
directory will be under the Yarn-defined application dir. In client
mode, the driver will create some stuff under spark.local.dir, but the
driver itself generally doesn't create many temp files IIRC.


On Fri, Jan 9, 2015 at 11:32 PM, Alessandro Baretta
alexbare...@gmail.com wrote:
 Gents,

 I'm building spark using the current master branch and deploying in to
 Google Compute Engine on top of Hadoop 2.4/YARN via bdutil, Google's Hadoop
 cluster provisioning tool. bdutils configures Spark with

 spark.local.dir=/hadoop/spark/tmp,

 but this option is ignored in combination with YARN. Bdutils also configures
 YARN with:

   property
 nameyarn.nodemanager.local-dirs/name
 value/mnt/pd1/hadoop/yarn/nm-local-dir/value
 description
   Directories on the local machine in which to application temp files.
 /description
   /property

 This is the right directory for spark to store temporary data in. Still,
 Spark is creating such directories as this:

 /tmp/spark-51388ee6-9de6-411d-b9b9-ab6f9502d01e

 and filling them up with gigabytes worth of output files, filling up the
 very small root filesystem.

 How can I diagnose why my Spark installation is not picking up the
 yarn.nodemanager.local-dirs from yarn?

 Alex



-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



/tmp directory fills up

2015-01-09 Thread Alessandro Baretta
Gents,

I'm building spark using the current master branch and deploying in to
Google Compute Engine on top of Hadoop 2.4/YARN via bdutil, Google's Hadoop
cluster provisioning tool. bdutils configures Spark with

spark.local.dir=/hadoop/spark/tmp,

but this option is ignored in combination with YARN. Bdutils also
configures YARN with:

  property
nameyarn.nodemanager.local-dirs/name
value/mnt/pd1/hadoop/yarn/nm-local-dir/value
description
  Directories on the local machine in which to application temp files.
/description
  /property

This is the right directory for spark to store temporary data in. Still,
Spark is creating such directories as this:

/tmp/spark-51388ee6-9de6-411d-b9b9-ab6f9502d01e

and filling them up with gigabytes worth of output files, filling up the
very small root filesystem.

How can I diagnose why my Spark installation is not picking up the
yarn.nodemanager.local-dirs from yarn?

Alex