Re: Spark on other parallel filesystems

Jeremy Freeman Fri, 04 Apr 2014 22:10:15 -0700

We run Spark (in Standalone mode) on top of a network-mounted file system 
(NFS), rather than HDFS, and find it to work great. It required no modification 
or special configuration to set this up; as Matei says, we just point Spark to 
data using the file location.


-- Jeremy

On Apr 4, 2014, at 8:12 PM, Matei Zaharia <matei.zaha...@gmail.com> wrote:

> As long as the filesystem is mounted at the same path on every node, you 
> should be able to just run Spark and use a file:// URL for your files.
> 
> The only downside with running it this way is that Lustre won’t expose data 
> locality info to Spark, the way HDFS does. That may not matter if it’s a 
> network-mounted file system though.
> 
> Matei
> 
> On Apr 4, 2014, at 4:56 PM, Venkat Krishnamurthy <ven...@yarcdata.com> wrote:
> 
>> All
>> 
>> Are there any drawbacks or technical challenges (or any information, really) 
>> related to using Spark directly on a global parallel filesystem  like 
>> Lustre/GPFS? 
>> 
>> Any idea of what would be involved in doing a minimal proof of concept? Is 
>> it just possible to run Spark unmodified (without the HDFS substrate) for a 
>> start, or will that not work at all? I do know that it’s possible to 
>> implement Tachyon on Lustre and get the HDFS interface – just looking at 
>> other options.
>> 
>> Venkat
>

Re: Spark on other parallel filesystems

Reply via email to