Hi All,

I've got a strange problem, that nutch indexes much less URLs then it
fetches. For example URL:
http://www.1stdirectory.com/Companies/1627406_ins_Catering_Limited.htm.
I assume that if fetched sucessfully because in fetch logs it mentioned only
once:
2009-10-26 10:01:46,502 INFO org.apache.nutch.fetcher.Fetcher: fetching
http://www.1stdirectory.com/Companies/1627406_ins_Catering_Limited.htm

But it was not sent to the indexer on indexing phase (I'm using custom
NutchIndexWriter and it logs every page for witch it's write method
executed). What could be possible reason? Is there a way to browse crawldb
to ensure that page really fetched? What else could I check?

Thanks
-- 
View this message in context: 
http://www.nabble.com/Nutch-indexes-less-pages%2C-then-it-fetches-tp26078798p26078798.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to