Afaik, it's not possible to do this via properties or from command-line. It could be done in a custom scoring filter because FreeGenerator calls injectedScore() for all active scoring filters plugins. We could also add such a functionality to FreeGenerator itself. Feel free to open an issue for that.
2014-03-26 15:57 GMT+01:00 John Lafitte <[email protected]>: > Thanks Sebastian, > > That did work when I set both of those to false, but now the url I'm > inserting has an abnormally high score. You mentioned two options, the > first was to use FreeGenerator with an initial score, however I cannot find > it documented anywhere how to do that. The only parameters I see is > normalize and filter and they don't take values. Can you point me in the > right direction for that? > > > On Wed, Mar 26, 2014 at 6:59 AM, Sebastian Nagel < > [email protected] > > wrote: > > > There may be no relevant links if all documents are from one single host > > (or domain) and > > (link.ignore.internal.host == true) > > resp. > > (link.ignore.internal.domain == true) > > cf. explanations about that in the wiki. > > > > > > 2014-03-26 4:09 GMT+01:00 John Lafitte <[email protected]>: > > > > > Thanks for that Sebastian. So given the hint you've given me, I'm > trying > > > to generate the scoring using this example: > > > https://wiki.apache.org/nutch/NewScoringIndexingExample > > > > > > But when it gets to the LinkRank part I get: > > > > > > 2014-03-26 02:57:14,208 INFO webgraph.LinkRank - Analysis: starting at > > > 2014-03-26 02:57:14 > > > 2014-03-26 02:57:14,913 INFO webgraph.LinkRank - Starting link counter > > job > > > 2014-03-26 02:57:17,927 INFO webgraph.LinkRank - Finished link counter > > job > > > 2014-03-26 02:57:17,928 INFO webgraph.LinkRank - Reading numlinks temp > > > file > > > 2014-03-26 02:57:17,932 ERROR webgraph.LinkRank - LinkAnalysis: > > > java.io.IOException: No links to process, is the webgra$ > > > at > > > > org.apache.nutch.scoring.webgraph.LinkRank.runCounter(LinkRank.java:132) > > > at > > > org.apache.nutch.scoring.webgraph.LinkRank.analyze(LinkRank.java:622) > > > at > > > org.apache.nutch.scoring.webgraph.LinkRank.run(LinkRank.java:702) > > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > > at > > > org.apache.nutch.scoring.webgraph.LinkRank.main(LinkRank.java:668) > > > > > > I can see the webgraph directory got created and there are directories > > and > > > files in there, but I'm guessing something is not getting populated > > > correctly. Any clue what I may be doing wrong? > > > > > > > > > On Tue, Mar 25, 2014 at 4:15 PM, Sebastian Nagel < > > > [email protected] > > > > wrote: > > > > > > > Hi John, > > > > > > > > FreeGenerator unlike Injector does not use db.score.injected > (default = > > > > 1.0) > > > > but sets the initial score to 0.0. If all URLs stem from > FreeGenerator > > > the > > > > total > > > > score in the link graph is also 0.0, and no linked documents can get > a > > > > higher score > > > > that 0.0 > > > > As possible solutions: > > > > - use FreeGenerator with a initial score > 0.0 > > > > (but don't put thousands URLs with a score of 1.0: > > > > if the total score is too high some pages may get unreasonable > > > > high scores) > > > > - use linkrank (https://wiki.apache.org/nutch/NewScoring) to get the > > > > scores: > > > > the default scoring OPIC has the advantage of calculating scores > > online > > > > while following links. It gives good and plausible scores if crawl > is > > > > started > > > > from few authoritative seeds. But sometimes, esp. in continuous > > crawls, > > > > OPIC scores run out of control. > > > > > > > > Sebastian > > > > > > > > On 03/25/2014 08:31 PM, John Lafitte wrote: > > > > > I setup a script that uses freegen to manually index new/updated > > URLs. > > > I > > > > > thought it was working great, but now I'm just realizing that Solr > > > > returns > > > > > a score of 0 for these new documents. I thought the score was > > > calculated > > > > > independent from what Nutch does, just uses the content and other > > > > metadata > > > > > to calculate it, however that doesn't seem to be the case. Anyone > > > have a > > > > > clue what might be causing this? The content and other metadata > look > > > > > normal and I reloaded the core to no avail. > > > > > > > > > > > > > > > > > > >

