Hi,

the scores in the index is not relevant for generating, only the scores in 
CrawlDb.
The ScoringFilter interface defines a method indexerScore(...), some scoring 
filters
return a modified (normalized) indexer score (cf. indexer.score.power).  Also, 
changes to
generate.min.score affect only which pages are fetched, pages fetched before 
may have a lower score.
The score may also change when a page is processed (parsed, etc.) or even 
afterwards
(by links pointing to it).

In short: generate.min.score determines what is crawled, not what is indexed.

Best,
Sebastian

On 04/18/2017 12:31 AM, Yongyao Jiang wrote:
> Hi,
> 
> I am using scoring-similarity plugin. After setting the generate.min.score
> to 0.05, and indexing all the pages (with its score) into Elastic, I can
> still observe many web pages whose scores are below 0.05.
> 
> <property>
>   <name>generate.min.score</name>
>   <value>0.05</value>
>   <description>Select only entries with a score larger than
>   generate.min.score.</description>
> </property>
> 
> Below is the result of a simple aggregation of "score" in ES,
>         {
>                "key": "20170417215917",
>                "doc_count": 200,
>                "Stats": {
>                   "count": 200,
>                   "min": 0,
>                   "max": 0.019184709,
>                   "avg": 0.0012828724450000002,
>                   "sum": 0.256574489
>                }
>             }
> 
> Thanks,
> Yongyao
> 

Reply via email to