Hi
 
-----Original message-----
> From:Sebastian Nagel <wastl.na...@googlemail.com>
> Sent: Monday 2nd December 2013 21:45
> To: user@nutch.apache.org
> Subject: Re: Score value lost after two successive redirects?
> 
> Hi Yann,
> 
> confirmed: redirects get score 0.0
> because initialScore() is called
> (and nothing more is done).
> 
> Redirects shouldn't be different from
> "ordinary" outlinks and the score
> should be "distributed" from redirect
> source to target. Of course, there
> is only one target per redirect, but
> it may happen that many redirects
> point to the same target.

I though about that too, it cannot be a good idea to just copy over the score. 
I can set the score back to zero for any high scoring link, at least for some 
time, before another one takes it over. And what about the target's own score 
it has deserved via some scoring algorithm?

It is also a problem when a redirect changes its target, what should we do with 
the old target's score, reset it? To what? 

At first i would think it's a good design not to propagate scores via redirects 
but i'm interested. I think a custom scoring filter and a call for propagating 
scores via redirects to scoring filters would be the best solution. This way it 
cannot be enabled by default.
> 
> Can you open a Jira issue?
> 
> Thanks,
> Sebastian
> 
> 
> On 12/02/2013 04:57 PM, yann wrote:
> > Hi everybody,
> > 
> > I have the following problem:
> > 
> > URL1 is redirected (302) to URL2, which is redirected (301) to URL3, and the
> > score in URL1 is lost (not passed to URL2 and URL3).
> > 
> > The DB contains:
> > 
> > URL1        Version: 7
> > Status: 4 (db_redir_temp)
> > Score: 408146.6
> > Signature: null
> > Metadata: Content-Type: text/html_pst_: temp_moved(13), lastModified=0
> > 
> > URL2        Version: 7
> > Status: 5 (db_redir_perm)
> > Score: 0.0
> > Signature: null
> > Metadata: Content-Type: text/html_pst_: moved(12), lastModified=0: 
> > 
> > URL3 Version: 7
> > Status: 6 (db_notmodified)
> > Score: 0.0
> > Signature: 62bb3b6cb8c8aaab8c5d197a64beedbd
> > Metadata: Content-Type: application/xhtml+xml_pst_: success(1),
> > lastModified=0
> > 
> > URL3 shows in the Solr search results, but at the very bottom due to the
> > score 0, which gives a fieldNorm of 0; however this document is very
> > relevant for me, so I don't want it to have a zero score.
> > 
> > Is this a bug in Nutch, or is there a config param to say "transfer score
> > when a URL is redirected" that I'm missing? Note that URL2 itself has a
> > score of 0.
> > 
> > Thanks for any help,
> > 
> > Yann
> > 
> > 
> > 
> > 
> > --
> > View this message in context: 
> > http://lucene.472066.n3.nabble.com/Score-value-lost-after-two-successive-redirects-tp4104440.html
> > Sent from the Nutch - User mailing list archive at Nabble.com.
> > 
> 
> 

Reply via email to