Hi there,
Thanks for reply again. What volume of data you are crawling and on how many
machines? Which version of nutch you are using? 0.7.1 or any other? Actually
it is working more or less fine but i want to know how much resources i will
need (machines) for crawling 20,000 websites in a day? If anyone can give me
any information in this regard i would really appreciate for that.

Thanks
Pushpesh


On 12/28/05, Nutch Newbie <[EMAIL PROTECTED]> wrote:
>
> Hi
>
> I have had no problem doing distributed crawl.
>
> On 12/28/05, Pushpesh Kr. Rajwanshi <[EMAIL PROTECTED]> wrote:
> > Hi NN,
> >
> > Thanks for replying me. Actually I wanted to know if distributed
> crawling in
> > nutch is working fine and to what success? Like i am successful in
> setting
> > up distributed crawl for 2 machines (1 master and 1 slave) but when i
> try
> > with more than two machines there seems problem specially while
> injecting
> > urls in crawlDB.
>
> Could you please post your log files please. For example jobtracker
> and tasktracker log file...ยจ
>
> So was wondering if anyone is successful in doing a massive
> > crawl using nutch involving crawling of millions of pages successfully?
> >
> > My requirement is to crawl like 20,000 websites (for say depth 5) in a
> day
> > and i was wondering how many machines would it require to do that.
> >
> > Would truely appreciate any response on this.
> >
> > Thanks In Advance
> > Pushpesh
> >
> >
> > On 12/28/05, Nutch Newbie <[EMAIL PROTECTED]> wrote:
> > >
> > > Have you tried the following:
> > >
> > > http://wiki.apache.org/nutch/HardwareRequirements
> > >
> > > and
> > >
> > > http://wiki.apache.org/nutch/
> > >
> > > There are no quick answer if one is planning to crawl million
> > > pages..Read..Try.. Read..
> > >
> > >
> > > On 12/28/05, Pushpesh Kr. Rajwanshi <[EMAIL PROTECTED]> wrote:
> > > > Hi,
> > > >
> > > > I want to know if anyone is able to successfully run distributed
> crawl
> > > on
> > > > multiple machines involving crawling millions of pages? and how hard
> is
> > > to
> > > > do that? Do i just have to do some configuration and set up or do
> some
> > > > implementations also?
> > > >
> > > > Also can anyone tell me if i want to crawl around 20,000 websites
> (say
> > > for
> > > > depth 5) in a day, is it possible and if yes then how many machines
> > > would i
> > > > roughly require? and what all configurations i will need? I would
> > > appreciate
> > > > even some very approximate numbers also as i can understand it might
> not
> > > be
> > > > trivial to find out or may be :-)
> > > >
> > > > TIA
> > > > Pushpesh
> > > >
> > > >
> > >
> >
> >
>

Reply via email to