Hi,
An observation is that you are using the 1.3 branch, which will now contain
some older code. For example the fetcher class has now been upgraded to deal
with Nutch-962, which is mentioned at the top of the class as per your URL
example.
Can anyone explain what the existing metadata being transferred is as per
below if it does not include the score as you state?
} else {
CrawlDatum newDatum = new CrawlDatum(CrawlDatum.STATUS_LINKED,
datum.getFetchInterval());
// transfer existing metadata
newDatum.getMetaData().putAll(datum.getMetaData());
try {
scfilters.initialScore(url, newDatum);
I would have imagined that the metadata would have included the relative
initial score we are discussing if it were to be of use in attributing an
initial URLs metadata to a redirect?
Apart from this, with the addition of your datum.getScore(), do the new
scores attributed to the URL redirects reflect accurately you're general
understanding of the web graph?
On Tue, Jul 12, 2011 at 12:45 PM, Nutch User - 1 <[email protected]>wrote:
> I have mentioned earlier
> (
> http://lucene.472066.n3.nabble.com/URL-redirection-and-zero-scores-td3085311.html
> )
> that I have encountered a problem in which redirected URLs and possibly,
> depending on the topography of the graph, all URLs inlinked to them will
> have zero scores.
>
> For instance, on the line 818 of Fetcher.java
> (
> http://svn.apache.org/viewvc/nutch/branches/branch-1.3/src/java/org/apache/nutch/fetcher/Fetcher.java?view=markup
> )
> a new CrawlDatum is created for the redirected URL but nowhere is the
> original URL's CrawlDatum's score passed to the new one. ScoringFilter
> interface's initialScore() method is called for the new CrawlDatum, but
> it only sets the score to zero.
>
> Is this how it was mentioned to be or is there a flaw?
>
> I started a crawl from http://www.aalto.fi which is redirected to
> http://www.aalto.fi/fi/ (in my case). The URL http://www.aalto.fi had
> 1.0f as its score but every other had 0.0f which in my opinion indicates
> that there's a problem. By adding "newDatum.setScore(datum.getScore());"
> after calling initialScore() resulted in a situation where none of the
> URLs' scores is zero.
>
--
*Lewis*