Take a look at this: http://wiki.lustre.org/index.php/Running_Hadoop_with_Lustre
Particularly: http://wiki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf (linked from that article) to get a better idea of what your options are. If its possible to avoid writing to [any] disk I'd recommend that route, since that's the performance advantage Spark has over vanilla Hadoop. On Wed Feb 11 2015 at 2:10:36 PM Tassilo Klein <tjkl...@gmail.com> wrote: > Thanks for the info. The file system in use is a Lustre file system. > > Best, > Tassilo > > On Wed, Feb 11, 2015 at 12:15 PM, Charles Feduke <charles.fed...@gmail.com > > wrote: > >> A central location, such as NFS? >> >> If they are temporary for the purpose of further job processing you'll >> want to keep them local to the node in the cluster, i.e., in /tmp. If they >> are centralized you won't be able to take advantage of data locality and >> the central file store will become a bottleneck for further processing. >> >> If /tmp isn't an option because you want to be able to monitor the file >> outputs as they occur you can also use HDFS (assuming your Spark nodes are >> also HDFS members they will benefit from data locality). >> >> It looks like the problem you are seeing is that a lock cannot be >> acquired on the output file in the central file system. >> >> On Wed Feb 11 2015 at 11:55:55 AM TJ Klein <tjkl...@gmail.com> wrote: >> >>> Hi, >>> >>> Using Spark 1.2 I ran into issued setting SPARK_LOCAL_DIRS to a different >>> path then local directory. >>> >>> On our cluster we have a folder for temporary files (in a central file >>> system), which is called /scratch. >>> >>> When setting SPARK_LOCAL_DIRS=/scratch/<node name> >>> >>> I get: >>> >>> An error occurred while calling >>> z:org.apache.spark.api.python.PythonRDD.newAPIHadoopFile. >>> : org.apache.spark.SparkException: Job aborted due to stage failure: >>> Task 0 >>> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage >>> 0.0 >>> (TID 3, XXXXXXX): java.io.IOException: Function not implemented >>> at sun.nio.ch.FileDispatcherImpl.lock0(Native Method) >>> at sun.nio.ch.FileDispatcherImpl.lock(FileDispatcherImpl.java: >>> 91) >>> at sun.nio.ch.FileChannelImpl.lock(FileChannelImpl.java:1022) >>> at java.nio.channels.FileChannel.lock(FileChannel.java:1052) >>> at org.apache.spark.util.Utils$.fetchFile(Utils.scala:379) >>> >>> Using SPARK_LOCAL_DIRS=/tmp, however, works perfectly. Any idea? >>> >>> Best, >>> Tassilo >>> >>> >>> >>> >>> >>> -- >>> View this message in context: http://apache-spark-user-list. >>> 1001560.n3.nabble.com/SPARK-LOCAL-DIRS-Issue-tp21602.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >