Have you changed from the default MemStore gora storage to something else? On Tuesday, June 25, 2013, Sznajder ForMailingList <[email protected]> wrote: > thanks Tejas > > Yes, I cheecked the logs and no Error appears in them > > I let the http.content.limit and parser.html.impl with their default > value... > > Benajmin > > > On Tue, Jun 25, 2013 at 6:14 PM, Tejas Patil <[email protected] >wrote: > >> Did you check the logs (NUTCH_HOME/logs/hadoop.log) for any exception or >> error messages ? >> Also you might have a look at these configs in nutch-site.xml (default >> values are in nutch-default.xml): >> http.content.limit and parser.html.impl >> >> >> On Tue, Jun 25, 2013 at 7:04 AM, Sznajder ForMailingList < >> [email protected]> wrote: >> >> > Hello >> > >> > I installed Nutch 2.2 on my linux machine. >> > >> > I defined the seed directory with one file containing: >> > http://en.wikipedia.org/ >> > http://edition.cnn.com/ >> > >> > >> > I ran the following: >> > sh bin/nutch inject ~/DataExplorerCrawl_gpfs/seed/ >> > >> > After this step: >> > the call >> > -bash-4.1$ sh bin/nutch readdb -stats >> > >> > returns >> > TOTAL urls: 2 >> > status 0 (null): 2 >> > avg score: 1.0 >> > >> > >> > Then, I ran the following: >> > bin/nutch generate -topN 10 >> > bin/nutch fetch -all >> > bin/nutch parse -all >> > bin/nutch updatedb >> > bin/nutch generate -topN 1000 >> > bin/nutch fetch -all >> > bin/nutch parse -all >> > bin/nutch updatedb >> > >> > >> > However, the stats call after these steps is still: >> > the call >> > -bash-4.1$ sh bin/nutch readdb -stats >> > status 5 (status_redir_perm): 1 >> > max score: 2.0 >> > TOTAL urls: 3 >> > avg score: 1.3333334 >> > >> > >> > >> > Only 3 urls?! >> > What do I miss? >> > >> > thanks >> > >> > Benjamin >> > >> >
-- *Lewis*

