[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-11-12 Thread dcausse
dcausse added a subscriber: dcausse. dcausse added a comment. //First of all: sorry for all the low level details in this comment but it's always complex to tackle such relevance issues.// I assume that `life` is the query. Wikidata already uses `incoming_link` to boost the top-N results (8196

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T88534: [Story] Implement EntitySearch service on top of Elastic

2015-10-14 Thread dcausse
dcausse added a subscriber: dcausse. TASK DETAIL https://phabricator.wikimedia.org/T88534 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Ricordisamoa, Addshore, Deskana, Manybubbles, Christopher, Wikidata-bugs, hoo, daniel

[Wikidata-bugs] [Maniphest] [Changed Project Column] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results for wikidata

2015-12-08 Thread dcausse
dcausse moved this task to Needs review on the Discovery-Cirrus-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T110648 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Updated] T78157: [Story] Implement label prefix search based on Elastic (resp Cirrus, Lucene)

2015-12-02 Thread dcausse
dcausse edited blocking tasks, added: T120089: Add an internal completion or suggestions API to core SearchEngine; removed: T112028: Implement completion suggester as a Beta Feature. TASK DETAIL https://phabricator.wikimedia.org/T78157 EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-12-03 Thread dcausse
dcausse added a comment. @aude I can help to write the rescore profiles when you are ready. Also I realized that the example profiles I wrote in Cirrus are wrong: they use "multiply" to combine the scores but it makes no sense : `(weight1 * score1) * (weight2 * score2)`. We might pre

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-12-03 Thread dcausse
dcausse added a comment. We can inhibit tf/idf by setting the weight of the main query to 0 and use either "max" or "add". Note that tf/idf will still play a role to extract the top-N results that will be rescored. N is 8196*7 (number of shards) so if shards are well bala

[Wikidata-bugs] [Maniphest] [Claimed] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-12-04 Thread dcausse
dcausse claimed this task. TASK DETAIL https://phabricator.wikimedia.org/T110648 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Sjoerddebruin, EBernhardson, aude, dcausse, Deskana, daniel, Mbch331, Aklapper, Lydia_Pintscher, Wikidata

[Wikidata-bugs] [Maniphest] [Changed Project Column] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-12-04 Thread dcausse
dcausse moved this task to In progress on the Discovery-Cirrus-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T110648 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Changed Project Column] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results for wikidata

2015-12-17 Thread dcausse
dcausse moved this task to Needs review on the Discovery-Cirrus-Sprint workboard. TASK DETAIL https://phabricator.wikimedia.org/T110648 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results for wikidata

2015-12-17 Thread dcausse
dcausse added a comment. moving back to needs-review as all patches needed in wikidata have been merged. TASK DETAIL https://phabricator.wikimedia.org/T110648 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: gerritbot, Sjoerddebruin

[Wikidata-bugs] [Maniphest] [Commented On] T110648: [Bug] high-ranking items seemed to have dropped significantly in Special:Search results

2015-11-18 Thread dcausse
dcausse added a comment. A big +1. As far as I know it should be pretty straightforward, you just need to implement 2 hooks (`CirrusSearchMappingConfig` and `CirrusSearchBuildDocumentParse`). The profiles (we may want to create multiple profiles with different weights for testing purpose) can

[Wikidata-bugs] [Maniphest] [Unblock] T78157: [Story] Implement label prefix search based on Elastic (resp Cirrus, Lucene)

2016-02-18 Thread dcausse
dcausse closed blocking task T120089: Add an internal completion or suggestions API to core SearchEngine as "Declined". TASK DETAIL https://phabricator.wikimedia.org/T78157 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Ja

[Wikidata-bugs] [Maniphest] [Commented On] T89733: Allow ContentHandler to expose structured data to the search engine.

2016-05-10 Thread dcausse
dcausse added a comment. >> For Wikibase, I am experimenting with unnested fields for multilingual content, though then we have hundreds of unnested fields and not sure there is some point where it's too many? > > Since Elastic says nested field is separate document an

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2017-01-18 Thread dcausse
dcausse added a comment. While reading elastic5 breaking change notes I realized that they've added a hard limit on the number of fields in the mapping. The limit is 1000 by default. This limit can be increased by changing the config but we might still want to think of an alternative here just

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2016-11-17 Thread dcausse
dcausse added a comment. There are tons of possibilities and the solution highly depends on the usecases you'd like to support. I think more precise examples would definitely help. Note that the term representation in Elastic is not merely intended a search index, but also for retrieving all

[Wikidata-bugs] [Maniphest] [Edited] T113034: RFC: Overhaul Interwiki map, unify with Sites and WikiMap

2016-11-15 Thread dcausse
dcausse edited the task description. (Show Details) EDIT DETAILS... - split CDB from SQL implementation - split CDB from SQL implementation: https://gerrit.wikimedia.org/r/#/c/321674/ - implement array-based InterwikiLookup (loads from multiple JSON or PHP files)...TASK DETAILhttps

[Wikidata-bugs] [Maniphest] [Edited] T113034: RFC: Overhaul Interwiki map, unify with Sites and WikiMap

2016-11-18 Thread dcausse
dcausse edited the task description. (Show Details) EDIT DETAILS... - split CDB from SQL implementation: https://gerrit.wikimedia.org/r/#/c/321674/ - implement array-based InterwikiLookup (loads from multiple JSON or PHP files)...TASK DETAILhttps://phabricator.wikimedia.org/T113034EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2016-11-17 Thread dcausse
dcausse added a comment. In T150891#2802255, @daniel wrote: @dcausse I added use cases to the ticket description Thanks! I think we need to distinguish 2 very different search usecases: Autocomplete Looking at the current behavior it seems that you display exact matches first

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2017-01-13 Thread dcausse
dcausse added a comment. quick draft of a working session with @Smalyshev F5282059: wikidata_prefix_elastic.txt (only addresses completion search for now)TASK DETAILhttps://phabricator.wikimedia.org/T150891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: aude

[Wikidata-bugs] [Maniphest] [Commented On] T173231: Wikidata Elastic search drops results with matches in different language label

2017-08-13 Thread dcausse
dcausse added a comment. I think the solution here is to return offsets alongside the text snippets. There is an offset gap of 1 between array elements so instance an array [ "", "Image" ] will have 1 as a starting offset for the query image. The hack would be to send

[Wikidata-bugs] [Maniphest] [Commented On] T125500: Index Wikidata labels and descriptions as separate fields in ElasticSearch

2017-07-06 Thread dcausse
dcausse added a comment. The query seems to use an inexistent field labels_all.near_match changing to labels_all.near_match_folded will fix the highlighting issue.TASK DETAILhttps://phabricator.wikimedia.org/T125500EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Commented On] T175199: Index certain statements for Wikidata items

2017-09-14 Thread dcausse
dcausse added a comment. Right now the field name is statements. I'm not sure whether we should add wb there (everything in that index is "wb", since it's on wikidata). What do you mean by "typed" though? I mean a name that bears the data types it stores, for me "stat

[Wikidata-bugs] [Maniphest] [Commented On] T175199: Index certain statements for Wikidata items

2017-09-08 Thread dcausse
dcausse added a comment. maybe custom analysis components in the extra plugin would make this easier? Unless we have some objections to making wikibase dependent on the wmf elastic plugins?TASK DETAILhttps://phabricator.wikimedia.org/T175199EMAIL PREFERENCEShttps://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T175199: Index certain statements for Wikidata items

2017-09-13 Thread dcausse
dcausse added a comment. I like the idea to bind the elastic property to the type of the statement. For now writing a mapping with default elastic tools allows to index nothing or everything, filtering must be done on the php side like you did in the current patch. Moving the filtering

[Wikidata-bugs] [Maniphest] [Commented On] T175199: Index certain statements for Wikidata items

2017-09-07 Thread dcausse
dcausse added a comment. deboosting can happen in the rescore stage, since we use a weighted sum we can either apply a negative penalty when relationship:P31:Q4167410 or a positive value when NOT relationship:P31:Q4167410. Will we add all properties or just a set of selected properties

[Wikidata-bugs] [Maniphest] [Commented On] T182717: Move fine tuning of search configs to mediawiki-config

2017-12-13 Thread dcausse
dcausse added a comment. A decent place for profiles has always been a pain and I could not find something sane. I'd like to address (improve) this problem adding a ProfileManager in cirrus. The goal would be: have sane defaults (generic profiles) provided in cirrus and/or wikibase that do

[Wikidata-bugs] [Maniphest] [Claimed] T182136: English labels in wikidata prefix search in non-English have low ranking

2017-12-13 Thread dcausse
dcausse claimed this task.dcausse moved this task from In progress to Done on the Discovery-Search (Current work) board. TASK DETAILhttps://phabricator.wikimedia.org/T182136WORKBOARDhttps://phabricator.wikimedia.org/project/board/1227/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2017-12-18 Thread dcausse
dcausse added a comment. If a large majority of such usecases involve searching the entity id (QXXX) of the newly created item we can perform an additional db match to compensate the lag of the search index. It's what we do for normal wikis, a db match is run in addition to the query sent

[Wikidata-bugs] [Maniphest] [Commented On] T183101: Items missing from Wikidata index

2017-12-18 Thread dcausse
dcausse added a comment. Q45825730 is me, I used this one just to test.TASK DETAILhttps://phabricator.wikimedia.org/T183101EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: dcausseCc: Aklapper, EBernhardson, dcausse, Smalyshev, Lahi, Gq86, GoranSMilovanovic

[Wikidata-bugs] [Maniphest] [Commented On] T183053: New Wikidata items appear in search with a delay

2017-12-20 Thread dcausse
dcausse added a comment. Same for me I'd be for trying to increase the refresh rate on wikidata_content.TASK DETAILhttps://phabricator.wikimedia.org/T183053EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: dcausseCc: debt, jhsoby, Lydia_Pintscher, EBernhardson

[Wikidata-bugs] [Maniphest] [Commented On] T180382: Emptying description on Wikidata doesn't remove it from ElasticSearch database used by wbsearchentities

2017-11-14 Thread dcausse
dcausse added a comment. @Smalyshev it's certainly the case yes, tuning the noop script should fix the issue.TASK DETAILhttps://phabricator.wikimedia.org/T180382EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: dcausseCc: gerritbot, dcausse, Smalyshev

[Wikidata-bugs] [Maniphest] [Created] T182293: Tune wikidata fulltext search similarity parameters

2017-12-07 Thread dcausse
dcausse created this task.dcausse triaged this task as "Normal" priority.dcausse added projects: Wikidata, Discovery-Search (Current work), Discovery, User-Smalyshev. TASK DESCRIPTIONWikidata uses array fields, it's likely that popular items gets more aliases, the all field is affected a

[Wikidata-bugs] [Maniphest] [Commented On] T179091: Add case-insensitive title match capability for Wikidata search

2018-05-09 Thread dcausse
dcausse added a comment. @Smalyshev yes I'm pretty sure it'll work without annoyance. The only possible annoyance I could think about is because ℚ will be folded to Q but since it's not a prefix match it's probably not a big deal...TASK DETAILhttps://phabricator.wikimedia.org/T179091EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T179091: Add case-insensitive title match capability for Wikidata search

2018-05-09 Thread dcausse
dcausse added a comment. Sorry I misread your comment, It should be title.near_match_asciifolding not title.prefix_asciifolding, the latter will allow finding Q423213 when typing q42 so that is not what the task description says.TASK DETAILhttps://phabricator.wikimedia.org/T179091EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T182717: Move fine tuning of search configs to mediawiki-config

2018-05-11 Thread dcausse
dcausse added a comment. Yes I'll take care of that soon.TASK DETAILhttps://phabricator.wikimedia.org/T182717EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: dcausseCc: gerritbot, EBernhardson, Smalyshev, dcausse, Aklapper, Versusxo, Majesticalreaper22

[Wikidata-bugs] [Maniphest] [Commented On] T194245: Implement searching of 'depicts' on commons with the 'quantity' qualifier

2018-05-15 Thread dcausse
dcausse added a comment. Few questions: if an image depicts (https://www.wikidata.org/wiki/Q830375) should we find it when searching images that dipict a car? if an image depicts 2 x https://www.wikidata.org/wiki/Q37629 and 3 x https://www.wikidata.org/wiki/Q37624 should we consider

[Wikidata-bugs] [Maniphest] [Commented On] T177453: Add wikibase client support for searching wikidata items

2018-05-25 Thread dcausse
dcausse added a comment. Does it mean that we would make WikbaseClient dependent on CirrusSearch and create all necessary query builders into this client? Have we considered the possibility to run an actual API call to wbsearchentit...@wikidata.org? I have no clue if the current API output would

[Wikidata-bugs] [Maniphest] [Commented On] T182717: Move fine tuning of search configs to mediawiki-config

2018-06-27 Thread dcausse
dcausse added a comment. moving back to in progress as the second patch generated some warnings on test servers: [Wed Jun 27 13:49:51 2018] [hphp] [482:7f0a5afff700:37030:01] [] \nWarning: Invalid argument supplied for foreach() in /srv/mediawiki/php-1.32.0-wmf.8/extensions/CirrusSearch

[Wikidata-bugs] [Maniphest] [Commented On] T159924: Create Vagrant role for creating a set of wikis with Wikibase enabled

2018-02-01 Thread dcausse
dcausse added a comment. When enabling the role wikibase_repo it seems that wikibase is enabled for all the wikis. This makes this role problematic when enabled with other extensions like CirrusSearch. CirrusSearch needs to enable many wikis (multilang role) and it's not practical to have wikibase

[Wikidata-bugs] [Maniphest] [Commented On] T177453: Add wikibase client support for searching wikidata items

2018-02-12 Thread dcausse
dcausse added a comment. This is in theory possible but the problem is that some profiles refer to some class implementations that are maybe not available on the host wiki. So yes we could use the sister search logic with some adaptation but the may blocker will be that the builder implementation

[Wikidata-bugs] [Maniphest] [Commented On] T165982: Investigate using blazegraph for deep category searching / returning of results

2018-02-14 Thread dcausse
dcausse added a comment. I think the 1.31.0-wmf21 was cut just before this was merged so this will go to production wikis next week and when the config https://gerrit.wikimedia.org/r/#/c/410242/ is deployed. Earliest would be Friday 23 if we swat the config on Thursday evening.TASK DETAILhttps

[Wikidata-bugs] [Maniphest] [Commented On] T163642: Index Wikidata strings in statements for fulltext search

2018-08-21 Thread dcausse
dcausse added a comment. If this ticket is about matching (without using any kind of search keyword) an entity referencing a string regardless of its usage (label/alias/statements) then why not simply put all the statement strings into the text field or auxilliary_text it's no used currently

[Wikidata-bugs] [Maniphest] [Claimed] T182717: Move fine tuning of search configs to mediawiki-config

2018-03-13 Thread dcausse
dcausse claimed this task.dcausse moved this task from Backlog to In progress on the Discovery-Search (Current work) board. TASK DETAILhttps://phabricator.wikimedia.org/T182717WORKBOARDhttps://phabricator.wikimedia.org/project/board/1227/EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] [Commented On] T192345: Make File pages findable via the statement data contained in the search index

2018-04-17 Thread dcausse
dcausse added a comment. This should be pretty straightforward to do. Nitpick: we might consider naming the keyword differently so that it aligns with what we already have (insomething or hassomething). Why not something like haswbstatement?TASK DETAILhttps://phabricator.wikimedia.org/T192345EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T190066: Expose all slots to the search interface

2018-03-20 Thread dcausse
dcausse added a comment. (sorry I'm very new to MCR) How will this work regarding namespaces? I mean can there be a mix of namespaces here or is there a single top level namespace somewhere?TASK DETAILhttps://phabricator.wikimedia.org/T190066EMAIL PREFERENCEShttps://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T209859: Wikidata autocomplete (wbsearchentities) results with score <= 0

2018-11-20 Thread dcausse
dcausse added a comment. I suggest converting the negative boosts to a positive boost and flip the filter condition to MUST_NOT, I think we can do this automatically within cirrus.TASK DETAILhttps://phabricator.wikimedia.org/T209859EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] [Updated] T215615: Stop using negative scores for deboosting statements

2019-03-26 Thread dcausse
dcausse added a parent task: T218994: Epic: Deprecation warning on elasticsearch 6 . TASK DETAIL https://phabricator.wikimedia.org/T215615 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, alaa_wmde, Nandana, Lahi, Gq86

[Wikidata-bugs] [Maniphest] [Lowered Priority] T219364: Wikidata search lagging behind

2019-03-27 Thread dcausse
dcausse lowered the priority of this task from "Unbreak Now!" to "High". dcausse edited projects, added Discovery-Search (Current work), CirrusSearch, Operations; removed Discovery. dcausse added a comment. Restricted Application edited projects, added Discovery-Search; remov

[Wikidata-bugs] [Maniphest] [Retitled] T219364: Elasticsearch indices went read-only causing huge lag

2019-03-27 Thread dcausse
dcausse renamed this task from "Wikidata search lagging behind" to "Elasticsearch indices went read-only causing huge lag". dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T219364 EMAIL PREFERENCES https://phabricator.wikimedi

[Wikidata-bugs] [Maniphest] [Updated] T219364: Elasticsearch indices went read-only causing huge lag

2019-03-27 Thread dcausse
dcausse edited projects, added Discovery-Search (Current work); removed Discovery-Search. TASK DETAIL https://phabricator.wikimedia.org/T219364 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Lucas_Werkmeister_WMDE, Smalyshev

[Wikidata-bugs] [Maniphest] [Changed Project Column] T219364: Elasticsearch indices went read-only causing huge lag

2019-03-28 Thread dcausse
dcausse moved this task from in progress to Done on the Discovery-Search (Current work) board. dcausse added a comment. Backlog of updates is now completely absorbed, a script has been run to catchup lost updates, nothing we can do at this point except waiting for the maint script to stop

[Wikidata-bugs] [Maniphest] [Commented On] T124196: Fatal "cannot perform this operation with arrays" from CirrusSearch/ElasticaWrite (using JobQueueDB)

2019-04-01 Thread dcausse
dcausse added a comment. > E.g. avoid queuing updates of this type or this size (possibly configurable), or run them differently, or to try it as today and then catch/suppress the failure - maybe logging a warning in its stead. Imo the JobQueue should raise an error if it's not a

[Wikidata-bugs] [Maniphest] [Updated] T215615: Stop using negative scores for deboosting statements

2019-02-08 Thread dcausse
dcausse added a project: Wikidata. TASK DETAILhttps://phabricator.wikimedia.org/T215615EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: dcausseCc: dcausse, Aklapper, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, Wikidata-bugs

[Wikidata-bugs] [Maniphest] [Commented On] T215967: Add keyword for filtering based on captions in specific language

2019-02-17 Thread dcausse
dcausse added a comment. Why not put the languages as a suffix? inlabel:word@en: word in english inlabel:word@fr*: word in fr and all its fallbacks inlable:word@{pt,fr*,-fr-ca}: word in pt or fr and all its fallbacks except fr-ca inlabel:foo|bar@{pt,fr*,-fr-ca} foo or bar in pt or fr and all its

[Wikidata-bugs] [Maniphest] [Commented On] T215967: Add keyword for filtering based on captions in specific language

2019-02-18 Thread dcausse
dcausse added a comment. In general ORing a keyword is only meaning for keywords that match a code, it's rare when people ask us why can't I do intitle:foo OR intitle:bar. But here since we have multiple languages I feel that one may ask for a word in two languages different and prefer having

[Wikidata-bugs] [Maniphest] [Updated] T220823: Use ElasticSearch for bulk Wikidata entity term lookup

2019-04-12 Thread dcausse
dcausse edited projects, added Discovery-Search; removed Discovery. TASK DETAIL https://phabricator.wikimedia.org/T220823 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, alaa_wmde, Addshore, Aklapper, Nandana, Lahi, Gq86

[Wikidata-bugs] [Maniphest] [Merged] T209859: Wikidata autocomplete (wbsearchentities) results with score <= 0

2019-06-04 Thread dcausse
dcausse merged a task: T215615: Stop using negative scores for deboosting statements. TASK DETAIL https://phabricator.wikimedia.org/T209859 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Liuxinyu970226, dcausse, Smalyshev, EBernhardson

[Wikidata-bugs] [Maniphest] [Merged] T215615: Stop using negative scores for deboosting statements

2019-06-04 Thread dcausse
dcausse closed this task as a duplicate of T209859: Wikidata autocomplete (wbsearchentities) results with score = 0. TASK DETAIL https://phabricator.wikimedia.org/T215615 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper

[Wikidata-bugs] [Maniphest] [Commented On] T206613: Search of wikidata string property values using haswbstatement is case sensitive

2019-06-04 Thread dcausse
dcausse added a comment. we should also note we index this data in the main filter field which means that for searches that are unlikely to be ambiguous (IDs and such) one could simply search for 10.1371/journal.pcbi.1002947 <https://www.wikidata.org/w/index.php?search=10.1371/journal.p

[Wikidata-bugs] [Maniphest] [Commented On] T206613: Search of wikidata string property values using haswbstatement is case sensitive

2019-06-04 Thread dcausse
dcausse added a comment. @Smalyshev I totally agree, I was suggesting a UX where a first attempt search would try to match using the haswbstatement keyword (switched to case insensitive) and then a second try could be made using the fulltext mode if the first attempt is unsuccessful. TASK

[Wikidata-bugs] [Maniphest] [Commented On] T206613: Search of wikidata string property values using haswbstatement is case sensitive

2019-06-04 Thread dcausse
dcausse added a comment. @Smalyshev switching the main field for statements to `lowercase_keyword` won't break anything, it's like a new field it'll be taken into account just after the next reindex. I would advise against a new field here, the cardinality would nearly double. TASK DETAIL

[Wikidata-bugs] [Maniphest] [Changed Status] T202254: Use ExtensionRegistry instead of class_exists to check for CirrusSearch in Wikibase

2019-06-20 Thread dcausse
dcausse changed the task status from "Stalled" to "Open". TASK DETAIL https://phabricator.wikimedia.org/T202254 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Addshore, Aklapper, darthmon_wmde, Nandana, Lahi, Gq8

[Wikidata-bugs] [Maniphest] [Updated] T186037: Need mvn build mode that does not build gui

2019-07-16 Thread dcausse
dcausse added a project: Discovery-Wikidata-Query-Service-Sprint. TASK DETAIL https://phabricator.wikimedia.org/T186037 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Gehel, Aklapper, Smalyshev, darthmon_wmde, ET4Eva, Nandana, Lahi, Gq86

[Wikidata-bugs] [Maniphest] [Created] T231534: Vagrant first provision with wikibase and CirrusSearch causes 1146 Table 'wikidatawiki.wb_items_per_site' doesn't exist

2019-08-29 Thread dcausse
dcausse created this task. dcausse added projects: MediaWiki-Vagrant, CirrusSearch, MediaWiki-extensions-WikibaseClient. Restricted Application added a subscriber: Aklapper. Restricted Application added projects: Wikidata, Discovery-Search. TASK DESCRIPTION ==> default: Notice: /Stage[m

[Wikidata-bugs] [Maniphest] [Changed Project Column] T228503: vagrant wikibase cirrus role not working / should set $wgWBCSUseCirrus = true; to enable

2019-08-29 Thread dcausse
dcausse moved this task from Backlog to Done on the MediaWiki-Vagrant board. dcausse added a comment. The `searchIndexProperties` wikibase config needs to be populated as well. What I did (for the record): - fetch a fresh vagrant - enable wikibasecirrussearch - provision (ran

[Wikidata-bugs] [Maniphest] [Commented On] T228503: vagrant wikibase cirrus role not working / should set $wgWBCSUseCirrus = true; to enable

2019-08-29 Thread dcausse
dcausse added a comment. @Tgr thanks for the info! I've uploaded a patch but I attached the config listing the properties to index under mediainfo, reason is that it's mediainfo that creates these properties. Since we have to create some properties I'm not entirely sure what to do

[Wikidata-bugs] [Maniphest] [Updated] T230862: Create a way to filter only WB-related changes from Commons recentchanges

2019-09-04 Thread dcausse
dcausse added a project: Multimedia. dcausse added a comment. Per T230862#5427914 <https://phabricator.wikimedia.org/T230862#5427914> tagging #Multimedia <https://phabricator.wikimedia.org/tag/multimedia/> TASK DETAIL https://phabricator.wikimedia.org/T230862 EMAIL PREFERE

[Wikidata-bugs] [Maniphest] [Commented On] T232612: WikibaseCirrusSearch emits cirrussearch-too-busy-error errors

2019-09-11 Thread dcausse
dcausse added a comment. The API usage of wbsearchentities is higher than usual (https://graphite.wikimedia.org/S/h). This causes the poolcounter to start rejecting queries. I suppose we could also a bit better : Reading the code I see: // FIXME: this is a hack, we

[Wikidata-bugs] [Maniphest] [Commented On] T229329: WDQS Updater: java.lang.StringIndexOutOfBoundsException: String index out of range: -8

2019-07-30 Thread dcausse
dcausse added a comment. if uris.entity().length() is greater than entityId.length() by 8 char it'll cause this exception. Since it's a test server it's perhaps misconfigured. TASK DETAIL https://phabricator.wikimedia.org/T229329 EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Created] T229329: WDQS Updater: java.lang.StringIndexOutOfBoundsException: String index out of range: -8

2019-07-30 Thread dcausse
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION #logback.classic pattern: %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg %mdc%n

[Wikidata-bugs] [Maniphest] [Edited] T229329: WDQS Updater: java.lang.StringIndexOutOfBoundsException: String index out of range: -8

2019-07-30 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T229329 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE

[Wikidata-bugs] [Maniphest] [Commented On] T173248: Convert blank nodes to “unknown value”

2019-07-17 Thread dcausse
dcausse added a comment. I see that the response is t1514691780 t1514691780 Would

[Wikidata-bugs] [Maniphest] [Claimed] T186037: Need mvn build mode that does not build gui

2019-07-16 Thread dcausse
dcausse claimed this task. dcausse moved this task from Backlog to In progress on the Discovery-Wikidata-Query-Service-Sprint board. TASK DETAIL https://phabricator.wikimedia.org/T186037 WORKBOARD https://phabricator.wikimedia.org/project/board/1239/ EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Commented On] T186037: Need mvn build mode that does not build gui

2019-07-17 Thread dcausse
dcausse added a comment. We could also use `mvn -pl -gui` which does not require any changes TASK DETAIL https://phabricator.wikimedia.org/T186037 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Gehel, Aklapper, Smalyshev

[Wikidata-bugs] [Maniphest] [Block] T85159: [EPIC] Deploy a Wikidata Query Service into production

2019-10-02 Thread dcausse
dcausse reopened subtask T101013: Switch Wikidata Query Service logging to EventLogging infrastructure as Open. TASK DETAIL https://phabricator.wikimedia.org/T85159 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, dcausse Cc: Multichill

[Wikidata-bugs] [Maniphest] [Reopened] T101013: Switch Wikidata Query Service logging to EventLogging infrastructure

2019-10-02 Thread dcausse
dcausse reopened this task as "Open". TASK DETAIL https://phabricator.wikimedia.org/T101013 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Deskana, dcausse Cc: Smalyshev, Deskana, Aklapper, darthmon_wmde, ET4Eva, DannyS712, Nandana,

[Wikidata-bugs] [Maniphest] [Unassigned] T101013: Log Wikidata Query Service queries to the event gate infrastructure

2019-10-02 Thread dcausse
dcausse removed Deskana as the assignee of this task. dcausse added a project: Discovery-Wikidata-Query-Service-Sprint. TASK DETAIL https://phabricator.wikimedia.org/T101013 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Smalyshev

[Wikidata-bugs] [Maniphest] [Retitled] T101013: Log Wikidata Query Service queries to the event gate infrastructure

2019-10-02 Thread dcausse
dcausse renamed this task from "Switch Wikidata Query Service logging to EventLogging infrastructure" to "Log Wikidata Query Service queries to the event gate infrastructure". dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T10101

[Wikidata-bugs] [Maniphest] [Commented On] T101013: Log Wikidata Query Service queries to the event gate infrastructure

2019-11-06 Thread dcausse
dcausse added a comment. Tested sending an event to eventgate in beta generated by new code: curl -XPOST -H"Content-Type: application/json; charset=UTF-8" http://deployment-eventgate-3.deployment-prep.eqiad.wmflabs:8192/v1/events -d '[{"namespace":"ns123&

[Wikidata-bugs] [Maniphest] [Commented On] T237645: Add Preferences - Search - "Simple search in Completion" (Bool) ON (default)/OFF.

2019-11-07 Thread dcausse
dcausse added a comment. Adding a new preference is not really possible as this would mean we will index the data twice which we generally don't do. When you say "simple search" do you mean running a full text search in the Quick Search input box? This again I'm afraid is no

[Wikidata-bugs] [Maniphest] [Edited] T231411: Test new Updater service

2019-11-18 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T231411 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78, dcausse Cc: Lea_Lacroix_WMDE, Gehel, Igorkim78, Aklapper, Daniel_Mietchen, Fnielsen

[Wikidata-bugs] [Maniphest] [Edited] T231411: Test new Updater service

2019-11-18 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T231411 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78, dcausse Cc: Lea_Lacroix_WMDE, Gehel, Igorkim78, Aklapper, Daniel_Mietchen, Fnielsen

[Wikidata-bugs] [Maniphest] [Edited] T231411: Test new Updater service

2019-11-18 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T231411 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Igorkim78, dcausse Cc: Lea_Lacroix_WMDE, Gehel, Igorkim78, Aklapper, Daniel_Mietchen, Fnielsen

[Wikidata-bugs] [Maniphest] [Triaged] T237645: Add Preferences - Search - "Simple search in Completion" (Bool) ON (default)/OFF.

2019-11-12 Thread dcausse
dcausse edited projects, added Discovery-Search; removed MediaWiki-User-preferences. dcausse triaged this task as "Normal" priority. dcausse added a comment. Thanks! TASK DETAIL https://phabricator.wikimedia.org/T237645 EMAIL PREFERENCES https://phabricator.wikimedia.org/sett

[Wikidata-bugs] [Maniphest] [Retitled] T237645: Reconsider how apostrophes are handled in completion search for wikidata

2019-11-12 Thread dcausse
dcausse renamed this task from "Add Preferences - Search - "Simple search in Completion" (Bool) ON (default)/OFF." to "Reconsider how apostrophes are handled in completion search for wikidata". TASK DETAIL https://phabricator.wikimedia.org/T237

[Wikidata-bugs] [Maniphest] [Commented On] T237645: Add Preferences - Search - "Simple search in Completion" (Bool) ON (default)/OFF.

2019-11-12 Thread dcausse
dcausse added a comment. We do already index terms in multiple ways but I don't think this is a good use of server resources to duplicate a field just for letting users to select how apostrophes should be handled. It's why I suggest to re-frame this ticket in a simple //Actual behavior

[Wikidata-bugs] [Maniphest] [Created] T238408: Metrics from the wdqs updater are no longer collected

2019-11-15 Thread dcausse
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION The following metrics have disappeared from the Wikidata Query Service dashboard: - rdf-fetch

[Wikidata-bugs] [Maniphest] [Created] T237612: Investigate flaky tests in LabelServiceUnitTest.unionWithServiceCall_T159723_binds_union

2019-11-07 Thread dcausse
dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION 15:01:41 [ERROR] LabelServiceUnitTest.unionWithServiceCall_T159723_binds_union:364

[Wikidata-bugs] [Maniphest] [Created] T239687: Rework how value and reference changes are handled

2019-12-03 Thread dcausse
dcausse created this task. dcausse added projects: Wikidata-Query-Service, Discovery-Search (Current work). Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION The current workflow of the updater requires loading the triples

[Wikidata-bugs] [Maniphest] [Claimed] T239687: Rework how value and reference changes are handled

2019-12-03 Thread dcausse
dcausse claimed this task. dcausse triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T239687 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, DannyS712, Nan

[Wikidata-bugs] [Maniphest] [Commented On] T239687: Rework how value and reference changes are handled

2019-12-04 Thread dcausse
dcausse added a comment. Some numbers extracted from a dump: - number of values: 20,659,551 - number of unique values: 11,028,526 - number of references: 60,078,314 - number of unique references: 58,876,057 So to the question: > is it worthwhile to dedup values at imp

[Wikidata-bugs] [Maniphest] [Edited] T239414: Investigate how blank nodes are used and synced between wikibase and wdqs

2019-12-03 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T239414 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Smalyshev, Lucas_Werkmeister_WMDE, Igorkim78, dcausse, Aklapper, darthmon_wmde, DannyS712

[Wikidata-bugs] [Maniphest] [Edited] T239414: Investigate how blank nodes are used and synced between wikibase and wdqs

2019-12-03 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T239414 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Smalyshev, Lucas_Werkmeister_WMDE, Igorkim78, dcausse, Aklapper, darthmon_wmde, DannyS712

[Wikidata-bugs] [Maniphest] [Created] T239750: org.wikidata.query.rdf.tool.Updater - Importer error: ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access

2019-12-03 Thread dcausse
dcausse created this task. dcausse added projects: Wikidata-Query-Service, Discovery-Search (Current work). Restricted Application added a subscriber: Aklapper. Restricted Application added a project: Wikidata. TASK DESCRIPTION Seen on wdqs1004 after enabling async imports 20:29:03.200

[Wikidata-bugs] [Maniphest] [Triaged] T239750: org.wikidata.query.rdf.tool.Updater - Importer error: ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access

2019-12-03 Thread dcausse
dcausse triaged this task as "Medium" priority. TASK DETAIL https://phabricator.wikimedia.org/T239750 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, darthmon_wmde, DannyS712, Nandana,

[Wikidata-bugs] [Maniphest] [Edited] T240334: Evaluate adding all/more textual properties to the text field

2019-12-10 Thread dcausse
dcausse updated the task description. TASK DETAIL https://phabricator.wikimedia.org/T240334 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden

[Wikidata-bugs] [Maniphest] [Retitled] T240334: Evaluate adding all/some textual properties to the text field

2019-12-10 Thread dcausse
dcausse renamed this task from "Evaluate adding all/more textual properties to the text field" to "Evaluate adding all/some textual properties to the text field". TASK DETAIL https://phabricator.wikimedia.org/T240334 EMAIL PREFERENCES https://phabricator.wikimedi

[Wikidata-bugs] [Maniphest] [Retitled] T240334: Evaluate adding all/more textual properties to the text field

2019-12-10 Thread dcausse
dcausse renamed this task from "\Wikibase\EntityContent::getTextForSearchIndex no longer includes textual properties" to "Evaluate adding all/more textual properties to the text field". dcausse lowered the priority of this task from "High" to "Medium"

[Wikidata-bugs] [Maniphest] [Closed] T239898: Investigate triple counts difference between dumps and what blazegraph reports

2019-12-10 Thread dcausse
dcausse closed this task as "Invalid". dcausse added a comment. I recounted properly (using a rdf parser) the triple count from the dump after the munge operation and found 8.9B triples, closing as invalid. TASK DETAIL https://phabricator.wikimedia.org/T239898 EMAIL PREFERENC

[Wikidata-bugs] [Maniphest] [Triaged] T240334: \Wikibase\EntityContent::getTextForSearchIndex no longer includes textual properties

2019-12-10 Thread dcausse
dcausse triaged this task as "High" priority. TASK DETAIL https://phabricator.wikimedia.org/T240334 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, GoranSM

  1   2   3   4   5   6   7   >