On 07/12/2011 08:09 PM, lewis john mcgibbney wrote: > Well I think in order to address the problem directly it would be better to > focus on getting something working with a distribution of Nutch you are most > comfortable working with. For the time being I would avoid working with > trunk 2.0 unless you can justify otherwise. I would also either make a > decision between Nutch 1.2 and the current 1.3 release rather than focussing > on previous branches, which may or may not be stable depending on when you > last svn updated. > > If you can try working with a fresh 1.2 or 1.3 (preferrably 1.3) then we > could maybe get to the bottom of this one as it would be great to find > whether there is scope to file a JIRA with this. > > Thank you
Currently I'm working with the official 1.3 distribution of Nutch (apache-nutch-1.3-bin.zip). I have encountered this URL redirection and zero scores problem in both 1.2 and 1.3. I crawled ~12k pages with the quick fix I made, and none of the URLs in the CrawlDB had zero as their score. Before the fix crawling the same pages resulted in ~1.5k of the URLs having zero scores.

