I have about 2K links in urls file, and I just need to load them into solr/lucene index (on a local machine).
I ran inject/generate/fetch/parse/ cycle, and after that in "bin/nutch readseg -list" I got these stats: NAME 20111214182250 GENERATED 1851 FETCHER START 2011-12-14T18:24:08 FETCHER END 2011-12-14T19:52:25 FETCHED 3363 PARSED 275 So it parsed only 275 out of 3363. Is it normal for nutch and how should I parse unparsed data? (subsequent "bin/nutch parse" exec on the segment leads to "Segment already parsed" error) -- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-readdb-shows-much-more-fetched-urls-than-parsed-tp3588205p3588205.html Sent from the Nutch - User mailing list archive at Nabble.com.

