Hi -----Original message----- > From:Sebastian Nagel <wastl.na...@googlemail.com> > Sent: Monday 2nd December 2013 21:45 > To: user@nutch.apache.org > Subject: Re: Score value lost after two successive redirects? > > Hi Yann, > > confirmed: redirects get score 0.0 > because initialScore() is called > (and nothing more is done). > > Redirects shouldn't be different from > "ordinary" outlinks and the score > should be "distributed" from redirect > source to target. Of course, there > is only one target per redirect, but > it may happen that many redirects > point to the same target.
I though about that too, it cannot be a good idea to just copy over the score. I can set the score back to zero for any high scoring link, at least for some time, before another one takes it over. And what about the target's own score it has deserved via some scoring algorithm? It is also a problem when a redirect changes its target, what should we do with the old target's score, reset it? To what? At first i would think it's a good design not to propagate scores via redirects but i'm interested. I think a custom scoring filter and a call for propagating scores via redirects to scoring filters would be the best solution. This way it cannot be enabled by default. > > Can you open a Jira issue? > > Thanks, > Sebastian > > > On 12/02/2013 04:57 PM, yann wrote: > > Hi everybody, > > > > I have the following problem: > > > > URL1 is redirected (302) to URL2, which is redirected (301) to URL3, and the > > score in URL1 is lost (not passed to URL2 and URL3). > > > > The DB contains: > > > > URL1 Version: 7 > > Status: 4 (db_redir_temp) > > Score: 408146.6 > > Signature: null > > Metadata: Content-Type: text/html_pst_: temp_moved(13), lastModified=0 > > > > URL2 Version: 7 > > Status: 5 (db_redir_perm) > > Score: 0.0 > > Signature: null > > Metadata: Content-Type: text/html_pst_: moved(12), lastModified=0: > > > > URL3 Version: 7 > > Status: 6 (db_notmodified) > > Score: 0.0 > > Signature: 62bb3b6cb8c8aaab8c5d197a64beedbd > > Metadata: Content-Type: application/xhtml+xml_pst_: success(1), > > lastModified=0 > > > > URL3 shows in the Solr search results, but at the very bottom due to the > > score 0, which gives a fieldNorm of 0; however this document is very > > relevant for me, so I don't want it to have a zero score. > > > > Is this a bug in Nutch, or is there a config param to say "transfer score > > when a URL is redirected" that I'm missing? Note that URL2 itself has a > > score of 0. > > > > Thanks for any help, > > > > Yann > > > > > > > > > > -- > > View this message in context: > > http://lucene.472066.n3.nabble.com/Score-value-lost-after-two-successive-redirects-tp4104440.html > > Sent from the Nutch - User mailing list archive at Nabble.com. > > > >