Re: CrawlDB, very slow

Steve Cohen Tue, 28 Sep 2010 13:03:35 -0700

Thank you for the answer. I take it after changing mapred.job.tracker from
local to localhost:<port>, I know have to start up hadoop daemons so they
are listening on the port? I just ran the crawl script that was working
before and it gave me connection refused.


I just found this hadoop distributed guide (
http://hadoop.apache.org/common/docs/r0.19.2/quickstart.html#Local) that
mentions running a bin/start-all.sh and when I tried running that script, it
mentions I need to configure masters and slaves conf files, so I am working
on that.

On Tue, Sep 28, 2010 at 3:15 PM, Andrzej Bialecki <[email protected]> wrote:

> On 2010-09-28 21:09, Steve Cohen wrote:
>
>> How does one set up pseudo distributed with a local filesystem? are you
>> saying fs.default.name can be left as file:/// instead of being set to
>> hdfs://?
>>
>
> Yes. The whole idea of a distributed filesystem is that mapreduce tasks
> that run on possibly different machines needs to access the same filesystem
> namespace and the same filesystem objects from different locations. This
> condition is satisfied in a single-node setup since all tasks run on the
> same machine with the same local filesystem.
>
>
>  Do you then set mapred.job.tracker to file:/// as well?
>>
>
> No, that would be an invalid value no matter what... the proper values for
> mapred.job.tracker are either a magic value of "local" or a pair of
> "hostname:port" - in this case, since you want to run a real JobTracker then
> you need to set it to "localhost:12345" (i.e. arbitrary port number > 1024).
>
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>

Re: CrawlDB, very slow

Reply via email to