Hi, I got MapReduce working with an NFS filesystem pretty easily in the end (thanks to Jeff Ritchie) by having the following properties set in hadoop-site.xml:
fs.default.name = local (doesn't use DFS at all) mapred.local.dir = /tmp/hadoop/mapred/local (on the local hard disk of the workers) mapred.system.dir = /nfs-share/hadoop/tmp/mapred/system (on the NFS disk) mapred.temp.dir = /nfs-share/hadoop/tmp/temp As Doug pointed out, you need to specify *full paths* to input and output data, e.g: bin/hadoop org.apache.hadoop.examples.WordCount /nfs-share/hadoop/in /nfs-share/hadoop/out The WordCount example works fine but the Grep example does not. I think this is because the Grep example runs two jobs (a grep job and a sort job). The grep job works fine (provided that full paths are specified for input and output data) but the sort job does not - I think this is because the system does not specify full paths for the files for the sort job. Would it be easy to fix this? Thanks, Jon P.S. Should this conversation be moved to the hadoop mailing list? -------------------------------------------------------------- Dr Jon Blower Tel: +44 118 378 5213 (direct line) Technical Director Tel: +44 118 378 8741 (ESSC) Reading e-Science Centre Fax: +44 118 378 6413 ESSC Email: [EMAIL PROTECTED] University of Reading 3 Earley Gate Reading RG6 6AL, UK -------------------------------------------------------------- > -----Original Message----- > From: Raghavendra Prabhu [mailto:[EMAIL PROTECTED] > Sent: 02 March 2006 06:44 > To: [email protected] > Subject: Re: Hadoop MapReduce: using NFS as the filesystem > > Hi Jon > > The thing is when you mount a system and the mapred directory > is present on the mounted space > > it will write to that folder mimicking network writes > > So you can have this mounted in the task trackers i guess. > > Am i right guys? > > The dfs is managing content without having any filesystem in > place. It indirectly mimicks a networked file system on top > of your existing one. > > Hope that answers your question. Please correct me if i am wrong > > Rgds > Prabhu > > > On 3/1/06, Jon Blower <[EMAIL PROTECTED]> wrote: > > > > > > > > > > Stefan Groschupf wrote: > > > > in general > > > > hadoop's tasktracks and jobtrackers require to run with a > > > switched-on dfs. > > > > > > Stefan: that should not be the case. One should be able to run > > > things entirely out of the "local" filesystem. Absolute > pathnames > > > may be required for input and output directories, but > that's a bug > > > that we can fix. > > > > > > > Just to be clear - does this mean that I don't have to run > DFS at all, > > and I can get all input data from (and write all output data to) an > > NFS drive? > > DFS is unnecessary for my particular app (unless it brings other > > benefits that I'm not aware of). > > > > Jon > > > > > ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
