Nutch readdb shows much more fetched urls than parsed

mikaza Thu, 15 Dec 2011 07:28:13 -0800

I have about 2K links in urls file, and I just need to load them into
solr/lucene index (on a local machine).


I ran inject/generate/fetch/parse/ cycle, and after that in "bin/nutch
readseg -list" I got these stats:

NAME 20111214182250

GENERATED 1851

FETCHER START 2011-12-14T18:24:08

FETCHER END 2011-12-14T19:52:25

FETCHED 3363

PARSED 275

So it parsed only 275 out of 3363. Is it normal for nutch and how should I
parse unparsed data?

(subsequent "bin/nutch parse" exec on the segment leads to "Segment already
parsed" error)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nutch-readdb-shows-much-more-fetched-urls-than-parsed-tp3588205p3588205.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Nutch readdb shows much more fetched urls than parsed

Reply via email to