So I should add "scoring-link" to "plugin.includes" in nutch-site.xml?

Trying it... :)

On 12.10.2011 15:43, Markus Jelsma wrote:
Not quite. It relies on a scoring filter; check IndexerMapReduce code around
line 181.

On Wednesday 12 October 2011 15:35:08 Marek Bachmann wrote:
Not sure. What does it do? I thought solrindex would take the score
directly from the crawldb?

On 12.10.2011 15:32, Markus Jelsma wrote:
Are you using the scoring-link plugin?

On Wednesday 12 October 2011 15:18:12 Marek Bachmann wrote:
Hey Folks,

sorry for this second request to this topic. I managed to figure out
that the problem is nutch related.

Once again: I have a set of urls( ~182k ) fetched, parsed and ranked via
WebGraph. All went very well.

After that I want to index them to solr. This works fine too, except
that the boost isn't set.

I have debugged this issue for an example url:

nutch@hrz-pc318:/nutch/dumps/dbdump$ cat part-00001 | grep -A 9
http://www.mathematik.uni-kassel.de/~fgcaadm/fachgruppe-computeralgebra.
de/ JdM/beitrag-hohenwarter/bezier3cons.html

http://www.mathematik.uni-kassel.de/~fgcaadm/fachgruppe-computeralgebra.
de/ JdM/beitrag-hohenwarter/bezier3cons.html Version: 7
Status: 2 (db_fetched)
Fetch time: Fri Oct 14 14:03:18 CEST 2011
Modified time: Thu Jan 01 01:00:00 CET 1970
Retries since fetch: 0
Retry interval: 603450 seconds (6 days)
Score: 0.16124992
Signature: 02ab7d9e6655082ff139e8a9c9afb97f
Metadata: _pst_: success(1), lastModified=0

You see the score isn't 1.0
I ran the solrindex command an logged the traffic via tcpmon, here is
the extract of the document which is send to solr:

POST /solr/update?wt=javabin&version=2 HTTP/1.1
User-Agent:
Solr[org.apache.solr.client.solrj.impl.CommonsHttpSolrServer] 1.0
Host: localhost:8080
Transfer-Encoding: chunked
Content-Type: application/xml; charset=UTF-8

2000
<add>

           <doc boost="1.0">

                   <field name="site">

                           www.mathematik.uni-kassel.de

                   </field>
                   <field name="host">

                           www.mathematik.uni-kassel.de

                   </field>
                   <field name="lastModified">

                           2008-03-03T13:22:14.000Z

                   </field>
                   <field name="segment">I think

                           20111007135815

                   </field>
                   <field name="digest">

                           02ab7d9e6655082ff139e8a9c9afb97f

                   </field>
                   <field name="tstamp">

                           2011-10-07T12:25:48.230Z

                   </field>
                   <field name="date">

                           2008-03-03T13:22:14.000Z

                   </field>
                   <field name="type">

                           text/html

                   </field>
                   <field name="id">

http://www.mathematik.uni-kassel.de/~fgcaadm/fachgruppe-computeralgebra.
de/ JdM/beitrag-hohenwarter/bezier3cons.html</field>

                   <field name="url">

http://www.mathematik.uni-kassel.de/~fgcaadm/fachgruppe-computeralgebra.
de/ JdM/beitrag-hohenwarter/bezier3cons.html</field>

                   <field name="anchor">

                           bezier3cons.html

                   </field>
                   <field name="content">

                           [...]

                   </field>
                   <field name="title">

                           Kubische Bézierkurve - GeoGebra Dynamisches

Arbeitsblatt

                   </field>
                   <field name="boost">

                           1.0

                   </field>
                   <field name="contentLength">

                           1570

                   </field>

           </doc>
           [...]

</add>

So the boost is set to 1.0. I can't help myself why this happens. Need
your help. :)


Reply via email to