Hi David
I notice that a developer has pointed something very similar to what you're
describing in the code of OPICScoringFilter.java itself.
Sorry for being twp, but is this the bit I need to edit? I've never done
any java at all before.
/** Get a float value from Fetcher.SCORE_KEY, divide it by the number of
outlinks and apply. */
public CrawlDatum distributeScoreToOutlinks(Text fromUrl, ParseData
parseData, Collection<Entry<Text, CrawlDatum>> targets, CrawlDatum adjust,
int allCount) throws ScoringFilterException {
float score = scoreInjected;
String scoreString = parseData.getContentMeta().get(Nutch.SCORE_KEY);
if (scoreString != null) {
try {
score = Float.parseFloat(scoreString);
} catch (Exception e) {
e.printStackTrace(LogUtil.getWarnStream(LOG));
}
}
int validCount = targets.size();
if (countFiltered) {
score /= allCount;
} else {
if (validCount == 0) {
// no outlinks to distribute score, so just return adjust
return adjust;
}
score /= validCount;
}
// internal and external score factor
float internalScore = score * internalScoreFactor;
float externalScore = score * externalScoreFactor;
for (Entry<Text, CrawlDatum> target : targets) {
try {
String toHost = new URL(target.getKey().toString()).getHost();
String fromHost = new URL(fromUrl.toString()).getHost();
if(toHost.equalsIgnoreCase(fromHost)){
target.getValue().setScore(internalScore);
} else {
target.getValue().setScore(externalScore);
}
} catch (MalformedURLException e) {
e.printStackTrace(LogUtil.getWarnStream(LOG));
target.getValue().setScore(externalScore);
}
}
// XXX (ab) no adjustment? I think this is contrary to the algorithm
descr.
// XXX in the paper, where page "loses" its score if it's distributed to
// XXX linked pages...
return adjust;
}
--
View this message in context:
http://www.nabble.com/score-calculation-tp17695314p17733194.html
Sent from the Nutch - User mailing list archive at Nabble.com.