Re: Continuous crawling

Bai Shen Mon, 28 Nov 2011 06:10:21 -0800

We looked at the hadoop reporter and aren't sure how to access it with
nutch.  Is there a certain way it works?  Can you give me an example?
Thanks.


On Mon, Nov 21, 2011 at 3:11 PM, Markus Jelsma
<[email protected]>wrote:

> **
>
> > On Thu, Nov 10, 2011 at 3:32 PM, Markus Jelsma
>
> >
>
> > <[email protected]>wrote:
>
> > > > Interesting. How do you tell if the segments have been fetched, etc?
>
> > >
>
> > > after a job the shell script waits for its completion and return code.
> If
>
> > > it
>
> > > returns 0 all is fine and we move it to another queue. If != 0 then
>
> > > there's an
>
> > > error and reports via mail.
>
> > >
>
> > > Ah, okay. I didn't realize it was returning an error code.
>
> > >
>
> > > > How
>
> > > > do you know if there are any urls that had problems?
>
> > >
>
> > > Hadoop reporter shows statistics. There are always many errors for many
>
> > > reasons. This is normal because we crawl everything.
>
> >
>
> > How are you running Hadoop reporter?
>
> You'll get it for free when operating a Hadoop cluster.
>
> >
>
> > > > Or fetch jobs that
>
> > > > errored out, etc.
>
> > >
>
> > > The non-zero return code.
>

Re: Continuous crawling

Reply via email to