Thank you for the answer. I take it after changing mapred.job.tracker from local to localhost:<port>, I know have to start up hadoop daemons so they are listening on the port? I just ran the crawl script that was working before and it gave me connection refused.
I just found this hadoop distributed guide ( http://hadoop.apache.org/common/docs/r0.19.2/quickstart.html#Local) that mentions running a bin/start-all.sh and when I tried running that script, it mentions I need to configure masters and slaves conf files, so I am working on that. On Tue, Sep 28, 2010 at 3:15 PM, Andrzej Bialecki <[email protected]> wrote: > On 2010-09-28 21:09, Steve Cohen wrote: > >> How does one set up pseudo distributed with a local filesystem? are you >> saying fs.default.name can be left as file:/// instead of being set to >> hdfs://? >> > > Yes. The whole idea of a distributed filesystem is that mapreduce tasks > that run on possibly different machines needs to access the same filesystem > namespace and the same filesystem objects from different locations. This > condition is satisfied in a single-node setup since all tasks run on the > same machine with the same local filesystem. > > > Do you then set mapred.job.tracker to file:/// as well? >> > > No, that would be an invalid value no matter what... the proper values for > mapred.job.tracker are either a magic value of "local" or a pair of > "hostname:port" - in this case, since you want to run a real JobTracker then > you need to set it to "localhost:12345" (i.e. arbitrary port number > 1024). > > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > >

