thanks Tejas Yes, I cheecked the logs and no Error appears in them
I let the http.content.limit and parser.html.impl with their default value... Benajmin On Tue, Jun 25, 2013 at 6:14 PM, Tejas Patil <[email protected]>wrote: > Did you check the logs (NUTCH_HOME/logs/hadoop.log) for any exception or > error messages ? > Also you might have a look at these configs in nutch-site.xml (default > values are in nutch-default.xml): > http.content.limit and parser.html.impl > > > On Tue, Jun 25, 2013 at 7:04 AM, Sznajder ForMailingList < > [email protected]> wrote: > > > Hello > > > > I installed Nutch 2.2 on my linux machine. > > > > I defined the seed directory with one file containing: > > http://en.wikipedia.org/ > > http://edition.cnn.com/ > > > > > > I ran the following: > > sh bin/nutch inject ~/DataExplorerCrawl_gpfs/seed/ > > > > After this step: > > the call > > -bash-4.1$ sh bin/nutch readdb -stats > > > > returns > > TOTAL urls: 2 > > status 0 (null): 2 > > avg score: 1.0 > > > > > > Then, I ran the following: > > bin/nutch generate -topN 10 > > bin/nutch fetch -all > > bin/nutch parse -all > > bin/nutch updatedb > > bin/nutch generate -topN 1000 > > bin/nutch fetch -all > > bin/nutch parse -all > > bin/nutch updatedb > > > > > > However, the stats call after these steps is still: > > the call > > -bash-4.1$ sh bin/nutch readdb -stats > > status 5 (status_redir_perm): 1 > > max score: 2.0 > > TOTAL urls: 3 > > avg score: 1.3333334 > > > > > > > > Only 3 urls?! > > What do I miss? > > > > thanks > > > > Benjamin > > >

