aude added a comment.

possible ways of handling multilingual indexing of labels (and other wikibase 
term types):

**Multilingual indexing**

//multiple fields by language//

  "page": {
    "dynamic": "false",
    "_all": {
      "enabled": false
    },
    "properties": {
      "description_de": {
        "type": "string"
      },
      "description_en" {
        "type": "string"
      },
      "description_es": {
        "type": "string"
      },
      "label_de": {
        "type": "string"
      },
      "label_en" {
        "type": "string"
      },
      "label_es": {
        "type": "string"
      }
    }
  }

pros:

- ...

cons:

- multiple fields has the disadvantage that there would be potentially be a 
very large number these. (one for every language * three term types)

//Nested type//

  "page": {
    "dynamic": "false",
    "_all": {
      "enabled": false
    },
    "properties": {
      "descriptions": {
        "type": "nested",
        "properties": {
          "de": {
            "type": "string"
          },
          "en": {
            "type": "string"
          },
          "es": {
            "type": "string"
          }
      },
      "labels": {
        "type": "nested",
        "properties": {
          "de": {
            "type": "string"
          },
          "en": {
            "type": "string"
          },
          "es": {
            "type": "string"
          }
        }
      }
    }
  }

pros:

- ...

cons:

- nested can be a problem when the nesting gets very large, which it would.
- elastic seems to have a problem with multiple (nested) fields with the same 
name, such as 'en' nested under 'descriptions' and 'en' also nested under 
labels.  Unless there is a workaround, we might have to include a prefix for 
each language field, such as 'label_en' and "description_en' to disambiguate 
them.

To start with, this is what I am experimenting with but not convinced this is 
what we want.

//Language-specific child documents//

Language specific content (terms) could be split up and stored in child 
documents.

For language fallback, search / lookup could request a handful of languages and 
not have to retrieve all child documents.

Pros:

- won't have the large nesting
- if one label is updated, only one child document needs to be updated vs. the 
entire document / parent, but in practice with Cirrus, not sure it would work 
this way.

Cons:

- somewhat slower to query
- requires more memory to query the child documents


TASK DETAIL
  https://phabricator.wikimedia.org/T117520

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: aude
Cc: aude, Deskana, StudiesWorld, Aklapper, Smalyshev, Wikidata-bugs, Mbch331, 
jeremyb



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to