I have similar experience.

Reinhard schwab responded a possible fix.  See mail in this group from
Reinhard schwab  at 
Sun, 25 Oct 2009 10:03:41 +0100  (05:03 EDT)

I haven't have chance to try it out yet.
 
On Tue, 2009-10-27 at 07:34 -0700, caezar wrote:
> Hi All,
> 
> I've got a strange problem, that nutch indexes much less URLs then it
> fetches. For example URL:
> http://www.1stdirectory.com/Companies/1627406_ins_Catering_Limited.htm.
> I assume that if fetched sucessfully because in fetch logs it mentioned only
> once:
> 2009-10-26 10:01:46,502 INFO org.apache.nutch.fetcher.Fetcher: fetching
> http://www.1stdirectory.com/Companies/1627406_ins_Catering_Limited.htm
> 
> But it was not sent to the indexer on indexing phase (I'm using custom
> NutchIndexWriter and it logs every page for witch it's write method
> executed). What could be possible reason? Is there a way to browse crawldb
> to ensure that page really fetched? What else could I check?
> 
> Thanks

Reply via email to