You are partially right:
the method org.apache.nutch.scoring.opic.OPICScoringFilter.distributeScoreToOutlink is indeed influencing the document boost value depending on the number of outlinks its "parent document" has (I think). This is pretty straitforward; the two key lines of code: score /= allCount; score /= validCount; But if what you want is to have a constant boost value of 1.0 for every indexed document, the key method is org.apache.nutch.scoring.opic.OPICScoringFilter.indexerScore. It returns the document boost value. Hope it helps! David -----Original Message----- From: vanderkerkoff [mailto:[EMAIL PROTECTED] Sent: lundi, 9. juin 2008 15:47 To: [email protected] Subject: RE: score calculation Hi David I notice that a developer has pointed something very similar to what you're describing in the code of OPICScoringFilter.java itself. Sorry for being twp, but is this the bit I need to edit? I've never done any java at all before. /** Get a float value from Fetcher.SCORE_KEY, divide it by the number of outlinks and apply. */ public CrawlDatum distributeScoreToOutlinks(Text fromUrl, ParseData parseData, Collection<Entry<Text, CrawlDatum>> targets, CrawlDatum adjust, int allCount) throws ScoringFilterException { float score = scoreInjected; String scoreString = parseData.getContentMeta().get(Nutch.SCORE_KEY); if (scoreString != null) { try { score = Float.parseFloat(scoreString); } catch (Exception e) { e.printStackTrace(LogUtil.getWarnStream(LOG)); } } int validCount = targets.size(); if (countFiltered) { score /= allCount; } else { if (validCount == 0) { // no outlinks to distribute score, so just return adjust return adjust; } score /= validCount; } // internal and external score factor float internalScore = score * internalScoreFactor; float externalScore = score * externalScoreFactor; for (Entry<Text, CrawlDatum> target : targets) { try { String toHost = new URL(target.getKey().toString()).getHost(); String fromHost = new URL(fromUrl.toString()).getHost(); if(toHost.equalsIgnoreCase(fromHost)){ target.getValue().setScore(internalScore); } else { target.getValue().setScore(externalScore); } } catch (MalformedURLException e) { e.printStackTrace(LogUtil.getWarnStream(LOG)); target.getValue().setScore(externalScore); } } // XXX (ab) no adjustment? I think this is contrary to the algorithm descr. // XXX in the paper, where page "loses" its score if it's distributed to // XXX linked pages... return adjust; } -- View this message in context: http://www.nabble.com/score-calculation-tp17695314p17733194.html Sent from the Nutch - User mailing list archive at Nabble.com.
