Are you using the scoring-link plugin?

On Wednesday 12 October 2011 15:18:12 Marek Bachmann wrote:
> Hey Folks,
> 
> sorry for this second request to this topic. I managed to figure out
> that the problem is nutch related.
> 
> Once again: I have a set of urls( ~182k ) fetched, parsed and ranked via
> WebGraph. All went very well.
> 
> After that I want to index them to solr. This works fine too, except
> that the boost isn't set.
> 
> I have debugged this issue for an example url:
> 
> nutch@hrz-pc318:/nutch/dumps/dbdump$ cat part-00001 | grep -A 9
> http://www.mathematik.uni-kassel.de/~fgcaadm/fachgruppe-computeralgebra.de/
> JdM/beitrag-hohenwarter/bezier3cons.html
> 
> http://www.mathematik.uni-kassel.de/~fgcaadm/fachgruppe-computeralgebra.de/
> JdM/beitrag-hohenwarter/bezier3cons.html Version: 7
> Status: 2 (db_fetched)
> Fetch time: Fri Oct 14 14:03:18 CEST 2011
> Modified time: Thu Jan 01 01:00:00 CET 1970
> Retries since fetch: 0
> Retry interval: 603450 seconds (6 days)
> Score: 0.16124992
> Signature: 02ab7d9e6655082ff139e8a9c9afb97f
> Metadata: _pst_: success(1), lastModified=0
> 
> You see the score isn't 1.0
> I ran the solrindex command an logged the traffic via tcpmon, here is
> the extract of the document which is send to solr:
> 
> POST /solr/update?wt=javabin&version=2 HTTP/1.1
> User-Agent:
> Solr[org.apache.solr.client.solrj.impl.CommonsHttpSolrServer] 1.0
> Host: localhost:8080
> Transfer-Encoding: chunked
> Content-Type: application/xml; charset=UTF-8
> 
> 2000
> <add>
>          <doc boost="1.0">
>                  <field name="site">
>                          www.mathematik.uni-kassel.de
>                  </field>
>                  <field name="host">
>                          www.mathematik.uni-kassel.de
>                  </field>
>                  <field name="lastModified">
>                          2008-03-03T13:22:14.000Z
>                  </field>
>                  <field name="segment">
>                          20111007135815
>                  </field>
>                  <field name="digest">
>                          02ab7d9e6655082ff139e8a9c9afb97f
>                  </field>
>                  <field name="tstamp">
>                          2011-10-07T12:25:48.230Z
>                  </field>
>                  <field name="date">
>                          2008-03-03T13:22:14.000Z
>                  </field>
>                  <field name="type">
>                          text/html
>                  </field>
>                  <field name="id">
> 
> http://www.mathematik.uni-kassel.de/~fgcaadm/fachgruppe-computeralgebra.de/
> JdM/beitrag-hohenwarter/bezier3cons.html </field>
>                  <field name="url">
> 
> http://www.mathematik.uni-kassel.de/~fgcaadm/fachgruppe-computeralgebra.de/
> JdM/beitrag-hohenwarter/bezier3cons.html </field>
>                  <field name="anchor">
>                          bezier3cons.html
>                  </field>
>                  <field name="content">
>                          [...]
>                  </field>
>                  <field name="title">
>                          Kubische Bézierkurve - GeoGebra Dynamisches
> Arbeitsblatt
>                  </field>
>                  <field name="boost">
>                          1.0
>                  </field>
>                  <field name="contentLength">
>                          1570
>                  </field>
>          </doc>
>          [...]
> </add>
> 
> So the boost is set to 1.0. I can't help myself why this happens. Need
> your help. :)

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to