On 11.07.2011 14:36, Markus Jelsma wrote:
On Monday 11 July 2011 14:28:31 Marek Bachmann wrote:
On 09.07.2011 19:20, Markus Jelsma wrote:
Score and fetch time if i'm not mistaken.
BTW: I am still not too familiar with the nutch scoring. It is a little
bit confusing, that the word "scoring" seems to be used for two things.
First, I think there is a score witch nutch calculates for each page,
based on the inlinks from other pages to this site.
Second, there is a scoring in Lucene for search request with is
calculated while performing a request.
Scoring is determined by your scoring plugin, if you use one, such as OPIC.
It's not uncommon not to use scoring on the Nutch side. It has nothing to do
with Lucene scoring anymore.
Now it seems to me that the scoring, calculated by a scoring plugin, is
only used by the generator for selecting the urls?
In other words, if I index my fetched and parsed segments to solr, the
scoring values from the nutch plugins have no effects?
As you see by my imprecise questions, I am very confused about that.
Can somebody please confirm or correct me, if I am right with these thesis?
Furthermore, can someone explain or give some resources of "the crawl
time scoring" is performed in nutch?
That do you mean by `the crawl time scoring` ?
Yes, I mean the scoring calculated by the nutch scoring plugin(s).
That would really help me
when we see: "Generator: Selecting best-scoring urls due for fetch"
What is the criteria for best scoring urls ?