Dereckson added a comment.
@aude @dcausse If you still wish to test patch 257607, I've moved the
settings to -labs, as suggested in Gerrit by @EBernhardson before the code
freeze.
TASK DETAIL
https://phabricator.wikimedia.org/T110648
EMAIL PREFERENCES
aude added a comment.
i'm working in the current wikidata sprint to add labels to cirrus and
experiment with rescoring.
TASK DETAIL
https://phabricator.wikimedia.org/T110648
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: aude
Cc: gerritbot,
Deskana added a comment.
Since there's been no activity here in the past week, I'm going to take this
out of https://phabricator.wikimedia.org/tag/discovery-search-sprint/. @aude,
please ping us when you need code review, and we'll be there to help you out.
:-)
TASK DETAIL
EBernhardson added a comment.
deployment freeze is over, but it looks like @aude is still working out some
adjustments to the profiles before we go live with this.
TASK DETAIL
https://phabricator.wikimedia.org/T110648
EMAIL PREFERENCES
EBernhardson added a comment.
main patch is now reviewed and merged. The change to
operations/mediawiki-config to make the new profiles available is mergable but
needs to be done after the deployment freeze is over.
TASK DETAIL
https://phabricator.wikimedia.org/T110648
EMAIL PREFERENCES
dcausse added a comment.
moving back to needs-review as all patches needed in wikidata have been merged.
TASK DETAIL
https://phabricator.wikimedia.org/T110648
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: dcausse
Cc: gerritbot, Sjoerddebruin,
gerritbot added a subscriber: gerritbot.
gerritbot added a comment.
Change 257607 had a related patch set uploaded (by DCausse):
Add initial rescore profiles for wikidata
https://gerrit.wikimedia.org/r/257607
TASK DETAIL
https://phabricator.wikimedia.org/T110648
EMAIL PREFERENCES
aude added a comment.
one approach for main snak links could be to extract them when generating
parser output and then stored in yet another table similar to page links
alternatively, generating this data using hadoop could work
TASK DETAIL
https://phabricator.wikimedia.org/T110648
EMAIL
dcausse added a comment.
@aude I can help to write the rescore profiles when you are ready.
Also I realized that the example profiles I wrote in Cirrus are wrong: they use
"multiply" to combine the scores but it makes no sense : `(weight1 * score1) *
(weight2 * score2)`. We might prefer to use
dcausse added a comment.
We can inhibit tf/idf by setting the weight of the main query to 0 and use
either "max" or "add". Note that tf/idf will still play a role to extract the
top-N results that will be rescored. N is 8196*7 (number of shards) so if
shards are well balanced we should cover
daniel added a comment.
@dcausse: If the scores are comparable, I suggest we use max, not avg or sum.
If they are not comparable, then we can't use sum/avg/max, we'll have to use
some sort of product. We could play with log scaling and see if it helps.
In any case, I believe we should not use
aude added a comment.
In https://phabricator.wikimedia.org/T110648#1847639, @dcausse wrote:
> @aude I can help to write the rescore profiles when you are ready.
>
> Also I realized that the example profiles I wrote in Cirrus are wrong: they
> use "multiply" to combine the scores but it makes no
EBernhardson added a comment.
I pulled in a wikidata dump to our hypothesis testing cluster a couple weeks
ago, but haven't done anything with it. It contains 18.8M documents so should
be pretty much the whole thing.
With the train rolling forward i don't see any reason we can't push things
Deskana added a comment.
This task seems to have dropped off the radar, somewhat; is further discussion
needed, or is this awaiting action from someone (either Discovery or Wikidata)?
TASK DETAIL
https://phabricator.wikimedia.org/T110648
EMAIL PREFERENCES
dcausse added a comment.
A big +1.
As far as I know it should be pretty straightforward, you just need to
implement 2 hooks (`CirrusSearchMappingConfig` and
`CirrusSearchBuildDocumentParse`).
The profiles (we may want to create multiple profiles with different weights
for testing purpose) can
Lydia_Pintscher added a comment.
Sounds good to me too.
TASK DETAIL
https://phabricator.wikimedia.org/T110648
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Lydia_Pintscher
Cc: Sjoerddebruin, EBernhardson, aude, dcausse, Deskana, daniel, Mbch331,
aude added a comment.
What do folks think of adding a field that contains sitelink count as next
step? We consider site link count (+ label count) in the wikibase entity
suggester (which uses db backend now) and it does decently okay considering it
is a crude measure.
I think we could then
dcausse added a subscriber: dcausse.
dcausse added a comment.
//First of all: sorry for all the low level details in this comment but it's
always complex to tackle such relevance issues.//
I assume that `life` is the query.
Wikidata already uses `incoming_link` to boost the top-N results (8196
aude added a subscriber: aude.
aude added a comment.
https://www.wikidata.org/w/index.php?title=Special:WhatLinksHere/Q3=57
shows 57 incoming links which is same as what
https://www.wikidata.org/wiki/Q3?action=cirrusdump says.
the item has been edited recently and think cirrus is up-to-date.
aude added a comment.
in our own "search" based on the wb_terms table, we use term_weight which
considers the number of site links an item has and the number of labels it has.
we should probably add these as properties in the cirrus index.
that might be the easiest/quickest thing to do and
Lydia_Pintscher added a comment.
Thanks so much for looking into this!
TASK DETAIL
https://phabricator.wikimedia.org/T110648
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Lydia_Pintscher
Cc: aude, dcausse, Deskana, daniel, Mbch331, Aklapper,
Deskana added a comment.
Adding this to https://phabricator.wikimedia.org/tag/discovery-cirrus-sprint/
to investigate whether there's a cause to this that we are aware of.
TASK DETAIL
https://phabricator.wikimedia.org/T110648
EMAIL PREFERENCES
daniel added a comment.
We should also boost matches on labels and aliases. That's probably a little
harder to do, but should not be terrible.
TASK DETAIL
https://phabricator.wikimedia.org/T110648
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To:
23 matches
Mail list logo