Hi Eyeris

The boost value is simply the output of what the ScoringFilters give for a
document. Are you using OPIC?

Julien

On 20 May 2015 at 19:32, Eyeris RodrIguez Rueda <[email protected]> wrote:

> Hi all.
> Im using nutch 1.9 in local mode and solr 4.10 with half million of
> documents.
> An adaptive fetch schedule is being used for crawl pages that changes
> frequently.
> I have detected that nutch is calculting a extremely high boost for some
> documents and the document score in Solr is extremely high for these
> documents, and
> in consequence the order of documents is changed by this wrong boost.
> This a correct solr output for me using "cubadebate" query:
> *******************************
> {
>   "responseHeader": {
>     "status": 0,
>     "QTime": 195
>   },
>   "response": {
>     "numFound": 183486,
>     "start": 0,
>     "maxScore": 2.7115784,
>     "docs": [
>       {
>         "url": "http://www.cubadebate.cu/";,
>         "boost": 1.0175576,
>         "score": 2.7115784
>       },
>       {
>         "url": "http://www.cubadebate.cu/editores/preguntas-frecuentes/";,
>         "boost": 0.11512774,
>         "score": 0.59315777
>       },
>       {
>         "url": "http://www.cubadebate.cu/editores/";,
>         "boost": 0.16240995,
>         "score": 0.50842094
>       },
>       {
>         "url": "http://www.cubadebate.cu/feed/";,
>         "boost": 0.8635264,
>         "score": 0.42501986
>       },
>       {
>         "url": "http://www.cubadebate.cu/etiqueta/cine/";,
>         "boost": 0.13792185,
>         "score": 0.3541832
>       },
>       {
>         "url": "http://www.cubadebate.cu/web2/";,
>         "boost": 0.114989564,
>         "score": 0.3389473
>       },
>       {
>         "url": "
> http://www.cubadebate.cu/opinion/2015/03/06/diferencias-conciliables/";,
>         "boost": 0.18748672,
>         "score": 0.28334656
>       },
>       {
>         "url": "
> http://www.cubadebate.cu/noticias/2015/02/02/freddy-asiel-voy-por-el-desquite/
> ",
>         "boost": 0.13997546,
>         "score": 0.28334656
>       },
>       {
>         "url": "
> http://www.cubadebate.cu/especiales/2015/03/05/querido-hugo/";,
>         "boost": 0.13172969,
>         "score": 0.28334656
>       },
>       {
>         "url": "
> http://www.cubadebate.cu/noticias/2015/02/08/grammys-la-lista-completa-de-los-ganadores/comment-page-1/
> ",
>         "boost": 0.12959023,
>         "score": 0.24792825
>       }
>     ]
>   },
> ***********************************************
> this a incorrect solr output using "cubadebate" query:
> {
>   "responseHeader": {
>     "status": 0,
>     "QTime": 111
>   },
>   "response": {
>     "numFound": 172952,
>     "start": 0,
>     "maxScore": 22939964,
>     "docs": [
>       {
>         "url": "
> http://www.tvcubana.icrt.cu/seccion-temas/1088-yo-tambien-estoy-en-la-celac
> ",
>         "boost": 1422334460,
>         "score": 22939964
>       },
>       {
>         "url": "
> http://www.perlavision.icrt.cu/index.php/deportes/boxeo/14065-domadores-de-cuba-enfrentaran-a-guerreros-de-mexico-en-semifinal-de-la-v-serie-mundial-de-boxeo
> ",
>         "boost": 1675646080,
>         "score": 22476484
>       },
>       {
>         "url": "http://www.radiohc.cu/noticias/deportes/page/387";,
>         "boost": 1325039870,
>         "score": 21191032
>       },
>       {
>         "url": "
> http://www.perlavision.icrt.cu/index.php/bloqueo/13922-nacera-en-mayo-engage-cuba-un-vigoroso-lobby-antibloqueo-en-congreso-de-eeuu
> ",
>         "boost": 1663792640,
>         "score": 18730402
>       },
>       {
>         "url": "
> http://www.perlavision.icrt.cu/index.php/deportes/boxeo/14004-cuba-en-semifinales-de-serie-mundial-el-proximo-mes
> ",
>         "boost": 1528675840,
>         "score": 18730402
>       },
>       {
>         "url": "http://www.radiohc.cu/noticias/ciencias/page/76";,
>         "boost": 1326217090,
>         "score": 18542152
>       },
>       {
>         "url": "http://www.radiohc.cu/noticias/cultura/page/272";,
>         "boost": 1327128190,
>         "score": 18542152
>       },
>       {
>         "url": "
> http://www.tvcubana.icrt.cu/archivo/118-archiv0/1060-beisbol-cubano-sera-el-tema-de-la-mesa-redonda-en-sus-emisiones-de-miercoles-y-jueves
> ",
>         "boost": 1424298370,
>         "score": 18542152
>       },
>       {
>         "url": "
> http://www.tvcubana.icrt.cu/archivo/118-archiv0/1073-el-programa-nacional-de-medicamentos-en-la-mesa-redonda-miercoles-y-jueves
> ",
>         "boost": 1424231940,
>         "score": 18542152
>       },
>       {
>         "url": "
> http://www.tvcubana.icrt.cu/archivo/118-archiv0/897-la-mesa-redonda-presentara-miercoles-y-jueves-las-cooerativas-no-agropecuarias-p
> ",
>         "boost": 1424386690,
>         "score": 18542152
>       }
>     ]
>   },
>
> In this case the boost is extremely high,
> i have look at solrindexer plugin and i have seen this line 123
>   inputDoc.setDocumentBoost(doc.getWeight());
>
> in IndexerMapReduce.java(src/java/org/apache/nutch/indexer) in line 316
> also similar things:
> i think this increase the boost for all document.
>  // apply boost to all indexed fields.
>     doc.setWeight(boost);
>
> Please i really appreciated any advice or solution for this problem.
> Thanks in advance.
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to