Re: [Wikitech-l] Disabling Messaging Fallbacks for Language Analysis

2017-10-24 Thread Trey Jones
These changes were completed the week of October 9th and deployed the
following week. The re-indexing of the affected wikis was completed a few
hours ago and should be live everywhere now.

The list of affected languages is on the Phab ticket T177871[1] and a list
by wiki is on that page in a comment.[2]

For more details, see the write up on MediaWiki.[3]

[1] https://phabricator.wikimedia.org/T177871
[2] https://phabricator.wikimedia.org/T177871#3702836
[3] https://www.mediawiki.org/wiki/Wikimedia_Discovery/
Disabling_Messaging_Fallbacks_for_Language_Analysis


Trey Jones
Sr. Software Engineer, Search Platform
Wikimedia Foundation

On Tue, Sep 26, 2017 at 12:12 PM, Trey Jones  wrote:

> SUMMARY: The Search Platform team (formerly part of Discovery) is planning
> to fix a long-standing search bug on many wiki projects by disabling the
> code in CirrusSearch that re-uses the “fallback” languages (which are
> specified for user interface or system messages) for the language analysis
> modules (which are used to index words in search). Deployment is planned to
> start the week of October 9, 2017.
>
> Messaging fallbacks specify what language to show a message in when there
> is no message available in the language of a given wiki. A language
> analysis module is language-specific software that processes text to
> improve searching—so that, for example, searching for a given word will
> find related forms of that word, like "hope, hopes, hoping, hoped" or
> "resume, resumé, résumé" on English-language wikis.
>
> Fallback languages for system messages make sense for historical and
> cultural reasons—a reader of the Chechen Wikipedia is more likely to
> understand a user interface or system message in Russian than in French,
> Greek, Hindi, Italian, or Japanese—but the fallbacks don't necessarily make
> any linguistic sense. Chechen and Russian, for example, are from unrelated
> language families; while the languages have undoubtedly influenced one
> another, their grammars are completed different.
>
> We will deploy the software change that disables using messaging fallbacks
> for language analysis fallbacks in about two weeks (targeting the week of
> October 9, 2017), with any cross-language analysis exceptions explicitly
> configured in a new manner. Changes will not immediately happen to all
> affected wikis because each wiki in each language will need to be
> re-indexed, which is a separate process that takes time. There may also be
> other delays caused by Elasticsearch upgrades or other changes that need
> immediate attention.
>
> You can also track progress of the tasks on Phabricator[1] or read more,
> see examples, and get the full list of languages affected on MediaWiki.[2]
>
> [1] https://phabricator.wikimedia.org/T147959
>
> [2] https://www.mediawiki.org/wiki/Wikimedia_Discovery/
> Disabling_Messaging_Fallbacks_for_Language_Analysis
>
> Trey Jones
> Sr. Software Engineer, Search Platform
> Wikimedia Foundation
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Disabling Messaging Fallbacks for Language Analysis

2017-09-26 Thread Trey Jones
SUMMARY: The Search Platform team (formerly part of Discovery) is planning
to fix a long-standing search bug on many wiki projects by disabling the
code in CirrusSearch that re-uses the “fallback” languages (which are
specified for user interface or system messages) for the language analysis
modules (which are used to index words in search). Deployment is planned to
start the week of October 9, 2017.

Messaging fallbacks specify what language to show a message in when there
is no message available in the language of a given wiki. A language
analysis module is language-specific software that processes text to
improve searching—so that, for example, searching for a given word will
find related forms of that word, like "hope, hopes, hoping, hoped" or
"resume, resumé, résumé" on English-language wikis.

Fallback languages for system messages make sense for historical and
cultural reasons—a reader of the Chechen Wikipedia is more likely to
understand a user interface or system message in Russian than in French,
Greek, Hindi, Italian, or Japanese—but the fallbacks don't necessarily make
any linguistic sense. Chechen and Russian, for example, are from unrelated
language families; while the languages have undoubtedly influenced one
another, their grammars are completed different.

We will deploy the software change that disables using messaging fallbacks
for language analysis fallbacks in about two weeks (targeting the week of
October 9, 2017), with any cross-language analysis exceptions explicitly
configured in a new manner. Changes will not immediately happen to all
affected wikis because each wiki in each language will need to be
re-indexed, which is a separate process that takes time. There may also be
other delays caused by Elasticsearch upgrades or other changes that need
immediate attention.

You can also track progress of the tasks on Phabricator[1] or read more,
see examples, and get the full list of languages affected on MediaWiki.[2]

[1] https://phabricator.wikimedia.org/T147959

[2]
https://www.mediawiki.org/wiki/Wikimedia_Discovery/Disabling_Messaging_Fallbacks_for_Language_Analysis

Trey Jones
Sr. Software Engineer, Search Platform
Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l