[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2017-02-10 Thread Deskana
Deskana added a comment. In T150891#3018424, @Smalyshev wrote: @Deskana it is not affecting the users immediately. This particular ticket just talks about finding the right format. Then we have to implement it (ongoing), deploy it, test it, figure out proper scoring, turn it on as replacement for

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2017-02-10 Thread Smalyshev
Smalyshev added a comment. @Deskana it is not affecting the users immediately. This particular ticket just talks about finding the right format. Then we have to implement it (ongoing), deploy it, test it, figure out proper scoring, turn it on as replacement for the current search - then we can anno

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2017-02-10 Thread Deskana
Deskana added a comment. Is this affecting users on Wikidata, or is it infrastructure to build towards that? It's hard for me to tell from the task description and comments. I'd like to know whether to include it in the Discovery weekly update.TASK DETAILhttps://phabricator.wikimedia.org/T150891EMA

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2017-02-08 Thread Smalyshev
Smalyshev added a comment. So far per-language fields seem to work fine, so I think we can proceed with this scheme.TASK DETAILhttps://phabricator.wikimedia.org/T150891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Lydia_Pintscher, Jan_Dittrich,

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2017-01-18 Thread Smalyshev
Smalyshev added a comment. Current plan, as agreed on the meeting: Create a mock schema with language fields for current Wikidata Create a script to quick-produce a data for this scheme from JSON dumped entities Load it into the DB cluster and see whether it causes any issues and whether we can s

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2017-01-18 Thread dcausse
dcausse added a comment. While reading elastic5 breaking change notes I realized that they've added a hard limit on the number of fields in the mapping. The limit is 1000 by default. This limit can be increased by changing the config but we might still want to think of an alternative here just in c

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2017-01-17 Thread Smalyshev
Smalyshev added a comment. @aude would it be ok if I continued this from here?TASK DETAILhttps://phabricator.wikimedia.org/T150891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: aude, SmalyshevCc: Lydia_Pintscher, Jan_Dittrich, EBernhardson, dcausse, hoo, Rico

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2017-01-13 Thread dcausse
dcausse added a comment. quick draft of a working session with @Smalyshev F5282059: wikidata_prefix_elastic.txt (only addresses completion search for now)TASK DETAILhttps://phabricator.wikimedia.org/T150891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: aude,

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2017-01-07 Thread Lydia_Pintscher
Lydia_Pintscher added a comment. Yeah there are a few other things that need the move to Elastic (for example better ranking of suggestions). So if we can move this forward with WMF help that'd be awesome.TASK DETAILhttps://phabricator.wikimedia.org/T150891EMAIL PREFERENCEShttps://phabricator.wikim

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2016-11-21 Thread daniel
daniel added a comment. @Lydia_Pintscher How does this rank on your road map? Getting this done would be really helpful - in particular, it would prevent the terms table from blowing up in our face one fine Friday evening...TASK DETAILhttps://phabricator.wikimedia.org/T150891EMAIL PREFERENCEShttps:

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2016-11-19 Thread daniel
daniel added a comment. @EBernhardson wow, index expansion galore...TASK DETAILhttps://phabricator.wikimedia.org/T150891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: aude, danielCc: Lydia_Pintscher, Jan_Dittrich, EBernhardson, dcausse, hoo, Ricordisamoa, aud

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2016-11-19 Thread daniel
daniel added a comment. Ah, a note about priorities: use case one (completion match) is by far the most pressing need for us. It puts massive load on the DB server, and it's triggered several times whenever a user uses a search field.TASK DETAILhttps://phabricator.wikimedia.org/T150891EMAIL PREFERE

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2016-11-17 Thread daniel
daniel added a comment. To clarify: when you say of a given entity the input is an entity ID or a search string? Yes. Ideally, a list of potentially many entity IDs. We will need to know which result corresponds to which ID. If we refer to a search string then it's a matter of highlighting and di

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2016-11-17 Thread dcausse
dcausse added a comment. In T150891#2802255, @daniel wrote: @dcausse I added use cases to the ticket description Thanks! I think we need to distinguish 2 very different search usecases: Autocomplete Looking at the current behavior it seems that you display exact matches first and then prefi

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2016-11-17 Thread daniel
daniel added a comment. @dcausse I added use cases to the ticket descriptionTASK DETAILhttps://phabricator.wikimedia.org/T150891EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: aude, danielCc: EBernhardson, dcausse, hoo, Ricordisamoa, aude, Deskana, StudiesWorl

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2016-11-17 Thread dcausse
dcausse added a comment. There are tons of possibilities and the solution highly depends on the usecases you'd like to support. I think more precise examples would definitely help. Note that the term representation in Elastic is not merely intended a search index, but also for retrieving all label

[Wikidata-bugs] [Maniphest] [Commented On] T150891: Find a good way to represent multi-lingual text fields in Elastic

2016-11-16 Thread daniel
daniel added a comment. @Smalyshev What about one field per language, then? Is that feasible? We sould match one field OR a second OR a third... I think the max size of a fallback chain is about 10 languages (for all the chinese variants falling back on each other).TASK DETAILhttps://phabricator.wi