Hi,

I got MapReduce working with an NFS filesystem pretty easily in the end
(thanks to Jeff Ritchie) by having the following properties set in
hadoop-site.xml:

fs.default.name = local (doesn't use DFS at all)
mapred.local.dir = /tmp/hadoop/mapred/local (on the local hard disk of the
workers)
mapred.system.dir = /nfs-share/hadoop/tmp/mapred/system (on the NFS disk)
mapred.temp.dir = /nfs-share/hadoop/tmp/temp

As Doug pointed out, you need to specify *full paths* to input and output
data, e.g:

bin/hadoop org.apache.hadoop.examples.WordCount /nfs-share/hadoop/in
/nfs-share/hadoop/out

The WordCount example works fine but the Grep example does not.  I think
this is because the Grep example runs two jobs (a grep job and a sort job).
The grep job works fine (provided that full paths are specified for input
and output data) but the sort job does not - I think this is because the
system does not specify full paths for the files for the sort job.  Would it
be easy to fix this?

Thanks,
Jon

P.S. Should this conversation be moved to the hadoop mailing list?

--------------------------------------------------------------
Dr Jon Blower              Tel: +44 118 378 5213 (direct line)
Technical Director         Tel: +44 118 378 8741 (ESSC)
Reading e-Science Centre   Fax: +44 118 378 6413
ESSC                       Email: [EMAIL PROTECTED]
University of Reading
3 Earley Gate
Reading RG6 6AL, UK
--------------------------------------------------------------  

> -----Original Message-----
> From: Raghavendra Prabhu [mailto:[EMAIL PROTECTED] 
> Sent: 02 March 2006 06:44
> To: [email protected]
> Subject: Re: Hadoop MapReduce: using NFS as the filesystem
> 
> Hi Jon
> 
> The thing is when you mount a system and the mapred directory 
> is present on the mounted space
> 
> it will write to that folder mimicking network writes
> 
> So you can have this mounted in the task trackers i guess.
> 
> Am i right guys?
> 
> The dfs is managing content without having any filesystem in 
> place. It indirectly mimicks a networked file system on top 
> of your existing one.
> 
> Hope that answers your question. Please correct me if i am wrong
> 
> Rgds
> Prabhu
> 
> 
> On 3/1/06, Jon Blower <[EMAIL PROTECTED]> wrote:
> >
> >
> > >
> > > Stefan Groschupf wrote:
> > > > in general
> > > > hadoop's tasktracks and jobtrackers require to run with a
> > > switched-on  dfs.
> > >
> > > Stefan: that should not be the case.  One should be able to run 
> > > things entirely out of the "local" filesystem.  Absolute 
> pathnames 
> > > may be required for input and output directories, but 
> that's a bug 
> > > that we can fix.
> > >
> >
> > Just to be clear - does this mean that I don't have to run 
> DFS at all, 
> > and I can get all input data from (and write all output data to) an 
> > NFS drive?
> > DFS is unnecessary for my particular app (unless it brings other 
> > benefits that I'm not aware of).
> >
> > Jon
> >
> >
> 



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to