Take a look at this:

http://wiki.lustre.org/index.php/Running_Hadoop_with_Lustre

Particularly: http://wiki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf
(linked from that article)

to get a better idea of what your options are.

If its possible to avoid writing to [any] disk I'd recommend that route,
since that's the performance advantage Spark has over vanilla Hadoop.

On Wed Feb 11 2015 at 2:10:36 PM Tassilo Klein <tjkl...@gmail.com> wrote:

> Thanks for the info. The file system in use is a Lustre file system.
>
> Best,
>  Tassilo
>
> On Wed, Feb 11, 2015 at 12:15 PM, Charles Feduke <charles.fed...@gmail.com
> > wrote:
>
>> A central location, such as NFS?
>>
>> If they are temporary for the purpose of further job processing you'll
>> want to keep them local to the node in the cluster, i.e., in /tmp. If they
>> are centralized you won't be able to take advantage of data locality and
>> the central file store will become a bottleneck for further processing.
>>
>> If /tmp isn't an option because you want to be able to monitor the file
>> outputs as they occur you can also use HDFS (assuming your Spark nodes are
>> also HDFS members they will benefit from data locality).
>>
>> It looks like the problem you are seeing is that a lock cannot be
>> acquired on the output file in the central file system.
>>
>> On Wed Feb 11 2015 at 11:55:55 AM TJ Klein <tjkl...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Using Spark 1.2 I ran into issued setting SPARK_LOCAL_DIRS to a different
>>> path then local directory.
>>>
>>> On our cluster we have a folder for temporary files (in a central file
>>> system), which is called /scratch.
>>>
>>> When setting SPARK_LOCAL_DIRS=/scratch/<node name>
>>>
>>> I get:
>>>
>>>  An error occurred while calling
>>> z:org.apache.spark.api.python.PythonRDD.newAPIHadoopFile.
>>> : org.apache.spark.SparkException: Job aborted due to stage failure:
>>> Task 0
>>> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage
>>> 0.0
>>> (TID 3, XXXXXXX): java.io.IOException: Function not implemented
>>> at sun.nio.ch.FileDispatcherImpl.lock0(Native Method)
>>>         at sun.nio.ch.FileDispatcherImpl.lock(FileDispatcherImpl.java:
>>> 91)
>>>         at sun.nio.ch.FileChannelImpl.lock(FileChannelImpl.java:1022)
>>>         at java.nio.channels.FileChannel.lock(FileChannel.java:1052)
>>>         at org.apache.spark.util.Utils$.fetchFile(Utils.scala:379)
>>>
>>> Using SPARK_LOCAL_DIRS=/tmp, however, works perfectly. Any idea?
>>>
>>> Best,
>>>  Tassilo
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>> 1001560.n3.nabble.com/SPARK-LOCAL-DIRS-Issue-tp21602.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>

Reply via email to