you likely have a lot of fetched items that cannot be parsed. Check your url filters and parse plugins.
On Thursday 15 December 2011 11:39:21 mikaza wrote: > I have about 2K links in urls file, and I just need to load them into > solr/lucene index (on a local machine). > > I ran inject/generate/fetch/parse/ cycle, and after that in "bin/nutch > readseg -list" I got these stats: > > NAME 20111214182250 > > GENERATED 1851 > > FETCHER START 2011-12-14T18:24:08 > > FETCHER END 2011-12-14T19:52:25 > > FETCHED 3363 > > PARSED 275 > > So it parsed only 275 out of 3363. Is it normal for nutch and how should I > parse unparsed data? > > (subsequent "bin/nutch parse" exec on the segment leads to "Segment already > parsed" error) > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Nutch-readdb-shows-much-more-fetched-ur > ls-than-parsed-tp3588205p3588205.html Sent from the Nutch - User mailing > list archive at Nabble.com. -- Markus Jelsma - CTO - Openindex

