Le 20/09/2017 à 03:40, Trey Jones a écrit :

    Anyway, would it be a big deal to show the transliterated results
with less weight in ranking?

Doing any special weighting would be more difficult, but they would already be naturally ranked lower for not being exact matches. (You can see this at work if you compare the results for /resume, resumé,/ and /résumé/ on English Wikipedia, for example.)
Interesting to know. Thank you.

    Actually, add an option button in advanced search in any case, and
    just limit discussion about should it be opt-in or opt-out.


There are longer term plans for revamping advanced search capabilities, so if we want to go that route, it's doable, but it would definitely be on hold for a while. Options that have been mentioned include a special case keyword like "kana:オオカミ", or a more generic keyword like "phonetic:オオカミ" that was smart enough to know what to do with kana, but might do something different with other characters... but that's all at the vague ideation stage right now.
Well, I would expect "phonetic:" would bind with something like IPA, but the concept of keyword is interesting.

Thanks!


Trey Jones
Sr. Software Engineer, Search Platform
Wikimedia Foundation

On Tue, Sep 19, 2017 at 8:29 PM, mathieu stumpf guntz <[email protected] <mailto:[email protected]>> wrote:



    Le 19/09/2017 à 23:47, Trey Jones a écrit :
    We recently got a suggestion via Phabricator[1] to automatically map
    between hiragana and katakana when searching on English Wikipedia and other
    wiki projects. As an always-on feature, this isn't difficult to implement,
    but major commercial search engines (Google.jp, Bing, Yahoo Japan,
    DuckDuckGo, Goo) don't do that. They give different results when searching
    for hiragana/katakana forms (for example, オオカミ/おおかみ "wolf"). They also give
    different *numbers* of results, seeming to indicate that it's not just
    re-ordering the same results (say, so that results in the same script are
    ranked higher).[2] I want to know what they know that I don't!

    Does anyone have any thoughts on whether this would be useful (seems that
    it would) and whether it would cause any problems (it must, or otherwise
    all the other search engines would do it, right?).
    Well, maybe. Or not. Look how Duckduckgo continue to only give a
    "country" option to filter *languages*. Now both might be
    complementary,
    but personally I'm generally more interested with the later. All
    the more when
    I'm using a language which have no country using it as official
    language. :)

    Anyway, would it be a big deal to show the transliterated results
    with less
    weight in ranking? Actually, add an option button in advanced
    search in any
    case, and just limit discussion about should it be opt-in or opt-out.

    Any idea why it might be different between a Japanese-language wiki and a
    non-Japanese-language wiki? We often are more aggressive in matching
    between characters that are not native to a given language--for example,
    accents on Latin characters are generally ignored on English-language
    wikis. So it might make sense to merge hiragana and katakana on
    English-language wikis but not Japanese-language wikis.

    Thanks very much for any suggestions or information!
    —Trey

    どういたしました。
    [1] https://phabricator.wikimedia.org/T176197
    <https://phabricator.wikimedia.org/T176197> [2] Details of my
    tests at https://phabricator.wikimedia.org/T173650#3580309
    <https://phabricator.wikimedia.org/T173650#3580309>

    Trey Jones
    Sr. Software Engineer, Search Platform
    Wikimedia Foundation
    _______________________________________________
    Wikitech-l mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.wikimedia.org/mailman/listinfo/wikitech-l
    <https://lists.wikimedia.org/mailman/listinfo/wikitech-l>



_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to