Unfortunately, that leaves us holding the bag on how to fix it... and
I am at a loss for anything short of some hard coded reduction
ratio/factor for all mail scores.
Perhaps its just something we have to handle in the front ends, Mail
and Chats are stored in separate indexes, maybe we should just stick
to that for the front ends as well...
-Kevin Kubasik
On 12/17/05, Debajyoti Bera [EMAIL PROTECTED] wrote:
I have noticed that mail messages seem to get unusually high scores
from the indexer, while holmes makes the problem much less of a issue
(since it separates the conversation results) it still seems like
something worth fixing. I can't seem to figure out exactly why the
scoring is so off, but an initial guess would be the ease with which
we can add hotwords for email (subject lines) as opposed to most other
backends.
(from http://wiki.apache.org/jakarta-lucene/LuceneFAQ )
Lucene automatically adds a weight inversely proportional to the length of the
field i.e. terms in short fields (like sender name, email address, subject)
will get a higher weight (known as 'boost') that terms in text. Same holds
for document metadata - they have more weight than document data/text.
(from my understanding)
Beagle searches several lucene indexes and merges the results based on their
scores. Somewhere during the process, it recalculates the score based on the
age of the document. However, absolute value of lucene scores are not
directly comparable - the ratio (and hence the ranking) between the scores
are comparable. In that sense, I dont think scores across multiple indexes
should be directly compared. Ranking in a particular backend is meaningful
and IMO, that is correct way to do it.
- dBera
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers
--
Kevin Kubasik
240-838-6616
___
Dashboard-hackers mailing list
Dashboard-hackers@gnome.org
http://mail.gnome.org/mailman/listinfo/dashboard-hackers