Gents, I'm building spark using the current master branch and deploying in to Google Compute Engine on top of Hadoop 2.4/YARN via bdutil, Google's Hadoop cluster provisioning tool. bdutils configures Spark with
spark.local.dir=/hadoop/spark/tmp, but this option is ignored in combination with YARN. Bdutils also configures YARN with: <property> <name>yarn.nodemanager.local-dirs</name> <value>/mnt/pd1/hadoop/yarn/nm-local-dir</value> <description> Directories on the local machine in which to application temp files. </description> </property> This is the right directory for spark to store temporary data in. Still, Spark is creating such directories as this: /tmp/spark-51388ee6-9de6-411d-b9b9-ab6f9502d01e and filling them up with gigabytes worth of output files, filling up the very small root filesystem. How can I diagnose why my Spark installation is not picking up the yarn.nodemanager.local-dirs from yarn? Alex