thanks Tejas

Yes, I cheecked the logs and  no Error appears in them

I let the http.content.limit and parser.html.impl with their default
value...

Benajmin


On Tue, Jun 25, 2013 at 6:14 PM, Tejas Patil <[email protected]>wrote:

> Did you check the logs (NUTCH_HOME/logs/hadoop.log) for any exception or
> error messages ?
> Also you might have a look at these configs in nutch-site.xml (default
> values are in nutch-default.xml):
> http.content.limit and parser.html.impl
>
>
> On Tue, Jun 25, 2013 at 7:04 AM, Sznajder ForMailingList <
> [email protected]> wrote:
>
> > Hello
> >
> > I installed Nutch 2.2 on my linux machine.
> >
> > I defined the seed directory with one file containing:
> > http://en.wikipedia.org/
> > http://edition.cnn.com/
> >
> >
> > I ran the following:
> > sh bin/nutch inject ~/DataExplorerCrawl_gpfs/seed/
> >
> > After this step:
> > the call
> > -bash-4.1$ sh bin/nutch readdb -stats
> >
> > returns
> > TOTAL urls:     2
> > status 0 (null):        2
> > avg score:      1.0
> >
> >
> > Then, I ran the following:
> > bin/nutch generate -topN 10
> > bin/nutch fetch -all
> > bin/nutch parse -all
> > bin/nutch updatedb
> > bin/nutch generate -topN 1000
> > bin/nutch fetch -all
> > bin/nutch parse -all
> > bin/nutch updatedb
> >
> >
> > However, the stats call after these steps is still:
> > the call
> > -bash-4.1$ sh bin/nutch readdb -stats
> > status 5 (status_redir_perm):   1
> > max score:      2.0
> > TOTAL urls:     3
> > avg score:      1.3333334
> >
> >
> >
> > Only 3 urls?!
> > What do I miss?
> >
> > thanks
> >
> > Benjamin
> >
>

Reply via email to