aude added a comment.
possible ways of handling multilingual indexing of labels (and other wikibase
term types):
**Multilingual indexing**
//multiple fields by language//
"page": {
"dynamic": "false",
"_all": {
"enabled": false
},
"properties": {
"description_de": {
"type": "string"
},
"description_en" {
"type": "string"
},
"description_es": {
"type": "string"
},
"label_de": {
"type": "string"
},
"label_en" {
"type": "string"
},
"label_es": {
"type": "string"
}
}
}
pros:
- ...
cons:
- multiple fields has the disadvantage that there would be potentially be a
very large number these. (one for every language * three term types)
//Nested type//
"page": {
"dynamic": "false",
"_all": {
"enabled": false
},
"properties": {
"descriptions": {
"type": "nested",
"properties": {
"de": {
"type": "string"
},
"en": {
"type": "string"
},
"es": {
"type": "string"
}
},
"labels": {
"type": "nested",
"properties": {
"de": {
"type": "string"
},
"en": {
"type": "string"
},
"es": {
"type": "string"
}
}
}
}
}
pros:
- ...
cons:
- nested can be a problem when the nesting gets very large, which it would.
- elastic seems to have a problem with multiple (nested) fields with the same
name, such as 'en' nested under 'descriptions' and 'en' also nested under
labels. Unless there is a workaround, we might have to include a prefix for
each language field, such as 'label_en' and "description_en' to disambiguate
them.
To start with, this is what I am experimenting with but not convinced this is
what we want.
//Language-specific child documents//
Language specific content (terms) could be split up and stored in child
documents.
For language fallback, search / lookup could request a handful of languages and
not have to retrieve all child documents.
Pros:
- won't have the large nesting
- if one label is updated, only one child document needs to be updated vs. the
entire document / parent, but in practice with Cirrus, not sure it would work
this way.
Cons:
- somewhat slower to query
- requires more memory to query the child documents
TASK DETAIL
https://phabricator.wikimedia.org/T117520
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: aude
Cc: aude, Deskana, StudiesWorld, Aklapper, Smalyshev, Wikidata-bugs, Mbch331,
jeremyb
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs