[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results for wikidata

2016-03-18 Thread Dereckson
Dereckson added a comment. @aude @dcausse If you still wish to test patch 257607, I've moved the settings to -labs, as suggested in Gerrit by @EBernhardson before the code freeze. TASK DETAIL https://phabricator.wikimedia.org/T110648 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results for wikidata

2016-02-03 Thread aude
aude added a comment. i'm working in the current wikidata sprint to add labels to cirrus and experiment with rescoring. TASK DETAIL https://phabricator.wikimedia.org/T110648 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: aude Cc: gerritbot,

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results for wikidata

2016-01-28 Thread Deskana
Deskana added a comment. Since there's been no activity here in the past week, I'm going to take this out of https://phabricator.wikimedia.org/tag/discovery-search-sprint/. @aude, please ping us when you need code review, and we'll be there to help you out. :-) TASK DETAIL

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results for wikidata

2016-01-20 Thread EBernhardson
EBernhardson added a comment. deployment freeze is over, but it looks like @aude is still working out some adjustments to the profiles before we go live with this. TASK DETAIL https://phabricator.wikimedia.org/T110648 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results for wikidata

2015-12-28 Thread EBernhardson
EBernhardson added a comment. main patch is now reviewed and merged. The change to operations/mediawiki-config to make the new profiles available is mergable but needs to be done after the deployment freeze is over. TASK DETAIL https://phabricator.wikimedia.org/T110648 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results for wikidata

2015-12-17 Thread dcausse
dcausse added a comment. moving back to needs-review as all patches needed in wikidata have been merged. TASK DETAIL https://phabricator.wikimedia.org/T110648 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: gerritbot, Sjoerddebruin,

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-12-08 Thread gerritbot
gerritbot added a subscriber: gerritbot. gerritbot added a comment. Change 257607 had a related patch set uploaded (by DCausse): Add initial rescore profiles for wikidata https://gerrit.wikimedia.org/r/257607 TASK DETAIL https://phabricator.wikimedia.org/T110648 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-12-03 Thread aude
aude added a comment. one approach for main snak links could be to extract them when generating parser output and then stored in yet another table similar to page links alternatively, generating this data using hadoop could work TASK DETAIL https://phabricator.wikimedia.org/T110648 EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-12-03 Thread dcausse
dcausse added a comment. @aude I can help to write the rescore profiles when you are ready. Also I realized that the example profiles I wrote in Cirrus are wrong: they use "multiply" to combine the scores but it makes no sense : `(weight1 * score1) * (weight2 * score2)`. We might prefer to use

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-12-03 Thread dcausse
dcausse added a comment. We can inhibit tf/idf by setting the weight of the main query to 0 and use either "max" or "add". Note that tf/idf will still play a role to extract the top-N results that will be rescored. N is 8196*7 (number of shards) so if shards are well balanced we should cover

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-12-03 Thread daniel
daniel added a comment. @dcausse: If the scores are comparable, I suggest we use max, not avg or sum. If they are not comparable, then we can't use sum/avg/max, we'll have to use some sort of product. We could play with log scaling and see if it helps. In any case, I believe we should not use

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-12-03 Thread aude
aude added a comment. In https://phabricator.wikimedia.org/T110648#1847639, @dcausse wrote: > @aude I can help to write the rescore profiles when you are ready. > > Also I realized that the example profiles I wrote in Cirrus are wrong: they > use "multiply" to combine the scores but it makes no

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-12-03 Thread EBernhardson
EBernhardson added a comment. I pulled in a wikidata dump to our hypothesis testing cluster a couple weeks ago, but haven't done anything with it. It contains 18.8M documents so should be pretty much the whole thing. With the train rolling forward i don't see any reason we can't push things

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-12-02 Thread Deskana
Deskana added a comment. This task seems to have dropped off the radar, somewhat; is further discussion needed, or is this awaiting action from someone (either Discovery or Wikidata)? TASK DETAIL https://phabricator.wikimedia.org/T110648 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-11-18 Thread dcausse
dcausse added a comment. A big +1. As far as I know it should be pretty straightforward, you just need to implement 2 hooks (`CirrusSearchMappingConfig` and `CirrusSearchBuildDocumentParse`). The profiles (we may want to create multiple profiles with different weights for testing purpose) can

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-11-18 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. Sounds good to me too. TASK DETAIL https://phabricator.wikimedia.org/T110648 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lydia_Pintscher Cc: Sjoerddebruin, EBernhardson, aude, dcausse, Deskana, daniel, Mbch331,

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-11-17 Thread aude
aude added a comment. What do folks think of adding a field that contains sitelink count as next step? We consider site link count (+ label count) in the wikibase entity suggester (which uses db backend now) and it does decently okay considering it is a crude measure. I think we could then

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-11-12 Thread dcausse
dcausse added a subscriber: dcausse. dcausse added a comment. //First of all: sorry for all the low level details in this comment but it's always complex to tackle such relevance issues.// I assume that `life` is the query. Wikidata already uses `incoming_link` to boost the top-N results (8196

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-11-12 Thread aude
aude added a subscriber: aude. aude added a comment. https://www.wikidata.org/w/index.php?title=Special:WhatLinksHere/Q3=57 shows 57 incoming links which is same as what https://www.wikidata.org/wiki/Q3?action=cirrusdump says. the item has been edited recently and think cirrus is up-to-date.

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-11-12 Thread aude
aude added a comment. in our own "search" based on the wb_terms table, we use term_weight which considers the number of site links an item has and the number of labels it has. we should probably add these as properties in the cirrus index. that might be the easiest/quickest thing to do and

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-11-12 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. Thanks so much for looking into this! TASK DETAIL https://phabricator.wikimedia.org/T110648 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lydia_Pintscher Cc: aude, dcausse, Deskana, daniel, Mbch331, Aklapper,

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-11-10 Thread Deskana
Deskana added a comment. Adding this to https://phabricator.wikimedia.org/tag/discovery-cirrus-sprint/ to investigate whether there's a cause to this that we are aware of. TASK DETAIL https://phabricator.wikimedia.org/T110648 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-10-13 Thread daniel
daniel added a comment. We should also boost matches on labels and aliases. That's probably a little harder to do, but should not be terrible. TASK DETAIL https://phabricator.wikimedia.org/T110648 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: