You are partially right: 

the method
org.apache.nutch.scoring.opic.OPICScoringFilter.distributeScoreToOutlink
is 
indeed influencing the document boost value depending on the number of
outlinks its "parent document" has (I think). This is pretty
straitforward; the two key lines of code:

score /= allCount;
score /= validCount;


But if what you want is to have a constant boost value of 1.0 for every
indexed document, the key method is
org.apache.nutch.scoring.opic.OPICScoringFilter.indexerScore. It returns
the document boost value. 

Hope it helps!

David



-----Original Message-----
From: vanderkerkoff [mailto:[EMAIL PROTECTED] 
Sent: lundi, 9. juin 2008 15:47
To: [email protected]
Subject: RE: score calculation


Hi David

I notice that a developer has pointed something very similar to what
you're
describing in the code of OPICScoringFilter.java itself.


Sorry for being twp, but is this the bit I need to edit?  I've never
done
any java at all before.


/** Get a float value from Fetcher.SCORE_KEY, divide it by the number of
outlinks and apply. */
  public CrawlDatum distributeScoreToOutlinks(Text fromUrl, ParseData
parseData, Collection<Entry<Text, CrawlDatum>> targets, CrawlDatum
adjust,
int allCount) throws ScoringFilterException {
    float score = scoreInjected;
    String scoreString =
parseData.getContentMeta().get(Nutch.SCORE_KEY);
    if (scoreString != null) {
      try {
        score = Float.parseFloat(scoreString);
      } catch (Exception e) {
        e.printStackTrace(LogUtil.getWarnStream(LOG));
      }
    }
    int validCount = targets.size();
    if (countFiltered) {
      score /= allCount;
    } else {
      if (validCount == 0) {
        // no outlinks to distribute score, so just return adjust
        return adjust;
      }
      score /= validCount;
    }
    // internal and external score factor
    float internalScore = score * internalScoreFactor;
    float externalScore = score * externalScoreFactor;
    for (Entry<Text, CrawlDatum> target : targets) {
      try {
        String toHost = new URL(target.getKey().toString()).getHost();
        String fromHost = new URL(fromUrl.toString()).getHost();
        if(toHost.equalsIgnoreCase(fromHost)){
          target.getValue().setScore(internalScore);
        } else {
          target.getValue().setScore(externalScore);
        }
      } catch (MalformedURLException e) {
        e.printStackTrace(LogUtil.getWarnStream(LOG));
        target.getValue().setScore(externalScore);
      }
    }
    // XXX (ab) no adjustment? I think this is contrary to the algorithm
descr.
    // XXX in the paper, where page "loses" its score if it's
distributed to
    // XXX linked pages...
    return adjust;
  }

-- 
View this message in context:
http://www.nabble.com/score-calculation-tp17695314p17733194.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to