Afaik, it's not possible to do this via properties or from command-line.
It could be done in a custom scoring filter because FreeGenerator calls
injectedScore() for all active scoring filters plugins. We could also add
such a functionality to FreeGenerator itself. Feel free to open an issue
for that.


2014-03-26 15:57 GMT+01:00 John Lafitte <[email protected]>:

> Thanks Sebastian,
>
> That did work when I set both of those to false, but now the url I'm
> inserting has an abnormally high score.  You mentioned two options, the
> first was to use FreeGenerator with an initial score, however I cannot find
> it documented anywhere how to do that.  The only parameters I see is
> normalize and filter and they don't take values.  Can you point me in the
> right direction for that?
>
>
> On Wed, Mar 26, 2014 at 6:59 AM, Sebastian Nagel <
> [email protected]
> > wrote:
>
> > There may be no relevant links if all documents are from one single host
> > (or domain) and
> >  (link.ignore.internal.host == true)
> > resp.
> >  (link.ignore.internal.domain == true)
> > cf. explanations about that in the wiki.
> >
> >
> > 2014-03-26 4:09 GMT+01:00 John Lafitte <[email protected]>:
> >
> > > Thanks for that Sebastian.  So given the hint you've given me, I'm
> trying
> > > to generate the scoring using this example:
> > > https://wiki.apache.org/nutch/NewScoringIndexingExample
> > >
> > > But when it gets to the LinkRank part I get:
> > >
> > > 2014-03-26 02:57:14,208 INFO  webgraph.LinkRank - Analysis: starting at
> > > 2014-03-26 02:57:14
> > > 2014-03-26 02:57:14,913 INFO  webgraph.LinkRank - Starting link counter
> > job
> > > 2014-03-26 02:57:17,927 INFO  webgraph.LinkRank - Finished link counter
> > job
> > > 2014-03-26 02:57:17,928 INFO  webgraph.LinkRank - Reading numlinks temp
> > > file
> > > 2014-03-26 02:57:17,932 ERROR webgraph.LinkRank - LinkAnalysis:
> > > java.io.IOException: No links to process, is the webgra$
> > >         at
> > >
> org.apache.nutch.scoring.webgraph.LinkRank.runCounter(LinkRank.java:132)
> > >         at
> > > org.apache.nutch.scoring.webgraph.LinkRank.analyze(LinkRank.java:622)
> > >         at
> > > org.apache.nutch.scoring.webgraph.LinkRank.run(LinkRank.java:702)
> > >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > >         at
> > > org.apache.nutch.scoring.webgraph.LinkRank.main(LinkRank.java:668)
> > >
> > > I can see the webgraph directory got created and there are directories
> > and
> > > files in there, but I'm guessing something is not getting populated
> > > correctly.  Any clue what I may be doing wrong?
> > >
> > >
> > > On Tue, Mar 25, 2014 at 4:15 PM, Sebastian Nagel <
> > > [email protected]
> > > > wrote:
> > >
> > > > Hi John,
> > > >
> > > > FreeGenerator unlike Injector does not use db.score.injected
> (default =
> > > > 1.0)
> > > > but sets the initial score to 0.0. If all URLs stem from
> FreeGenerator
> > > the
> > > > total
> > > > score in the link graph is also 0.0, and no linked documents can get
> a
> > > > higher score
> > > > that 0.0
> > > > As possible solutions:
> > > > - use FreeGenerator with a initial score > 0.0
> > > >   (but don't put thousands URLs with a score of 1.0:
> > > >    if the total score is too high some pages may get unreasonable
> > > >    high scores)
> > > > - use linkrank (https://wiki.apache.org/nutch/NewScoring) to get the
> > > > scores:
> > > >   the default scoring OPIC has the advantage of calculating scores
> > online
> > > >   while following links. It gives good and plausible scores if crawl
> is
> > > > started
> > > >   from few authoritative seeds. But sometimes, esp. in continuous
> > crawls,
> > > >   OPIC scores run out of control.
> > > >
> > > > Sebastian
> > > >
> > > > On 03/25/2014 08:31 PM, John Lafitte wrote:
> > > > > I setup a script that uses freegen to manually index new/updated
> > URLs.
> > >  I
> > > > > thought it was working great, but now I'm just realizing that Solr
> > > > returns
> > > > > a score of 0 for these new documents.  I thought the score was
> > > calculated
> > > > > independent from what Nutch does, just uses the content and other
> > > > metadata
> > > > > to calculate it, however that doesn't seem to be the case.  Anyone
> > > have a
> > > > > clue what might be causing this?  The content and other metadata
> look
> > > > > normal and I reloaded the core to no avail.
> > > > >
> > > >
> > > >
> > >
> >
>

Reply via email to