Hi All, I've got a strange problem, that nutch indexes much less URLs then it fetches. For example URL: http://www.1stdirectory.com/Companies/1627406_ins_Catering_Limited.htm. I assume that if fetched sucessfully because in fetch logs it mentioned only once: 2009-10-26 10:01:46,502 INFO org.apache.nutch.fetcher.Fetcher: fetching http://www.1stdirectory.com/Companies/1627406_ins_Catering_Limited.htm
But it was not sent to the indexer on indexing phase (I'm using custom NutchIndexWriter and it logs every page for witch it's write method executed). What could be possible reason? Is there a way to browse crawldb to ensure that page really fetched? What else could I check? Thanks -- View this message in context: http://www.nabble.com/Nutch-indexes-less-pages%2C-then-it-fetches-tp26078798p26078798.html Sent from the Nutch - User mailing list archive at Nabble.com.