So in this scenario, are all categories planned to be sorted via jyupting? If so, we could make a collation, in which case categories would automatically be sorted that way, and you would just put the category in a page in the normal way (by doing [[Category:Foo]] with no sortkey). The downside would be that all categories would have to use jyupting.
> mw.loadData format Its just a normal lua table format. https://www.mediawiki.org/wiki/Module:ExtensionJson is an example. There are of course size limits for max sizes of a page (I think its 1 or 2 mb). But based on the size of https://raw.githubusercontent.com/MacroYau/PyJyutping/master/pyjyutping/data/jyutping_dictionary.json you will probably be within the size limits. > (b) it will be notably faster, because it runs directly on PHP, and not through the additional layer of Lua. Personally, I am unconvinced the speed will be significantly different. -- Brian On Sun, Mar 15, 2020 at 11:19 PM Huji Lee <[email protected]> wrote: > The additional information you provided was helpful. I still think the > best approach is to have an extension that returns the Jyutping value for > the article title. Let's say that extension introduces a new magic word > called {{TITLEINJYUPTING}}. That way you can add > {{DEFAULTSORT:{{TITLEINJYUPTING}}}} to the bottom of the pages; and for > exceptional pronunciations you can use {{DEFAULTSORT:[special Jyutping > pronunciation]}} instead. Alternatively, you could make it a parser > function like {{DEFAULTSORT:{{#JYUPTING:{{PAGETITLE}}}}}} or something like > that. > > If Juytping is as predictable as you state then making an extension should > be a good idea because (a) it can be used by non-WMF wikis too, without > having to set up Scrbunto, etc. and (b) it will be notably faster, because > it runs directly on PHP, and not through the additional layer of Lua. > > On Sun, Mar 15, 2020 at 7:03 PM Deryck Chan <[email protected]> wrote: > >> bawolff - Would you be able to point me to an example of mw.loadData? >> >> Also, I've subscribed to https://phabricator.wikimedia.org/T46667 . >> >> Huji - I was inspired by Japanese Wikipedia's approach to sorting - they >> have a {{DEFAULTSORT:[article name in hiragana]}} on all articles. Since >> Cantonese pronunciation is even more predictable than Japanese, we could >> potentially have a template that automatically adds {{DEFAULTSORT:[article >> title in Jyutping]}} using a Lua lookup table of all common Chinese >> characters. Exceptional pronunciations should then be coded individually. >> The Pinyin implementation of this would be equivalent, though it would >> depend on the zh.wp community agreeing on sorting things by Pinyin. >> >> In terms of storing the data, Wikidata is not a good answer. First up, >> the Wikidata property creators community has rejected the notion of >> creating separate properties for each common phonetic transcription system >> of CJK languages, so the retrieval of the phonetic transcriptions from >> Jyutping will be unnecessarily complicated. Second, Wikidata items refer to >> concepts, not titles. We could theoretically ask the script to go to >> Lexemes to fetch the phonetic transcription but that'll involve untangling >> the multiple Lexemes that refer to the same Chinese character. In general, >> the way Wikidata is structured makes it a bad fit for the problem at hand. >> >> Liangent's formulation of the problem is more general than the one I >> described, because T46667 aims to allow multiple ways of sorting Chinese >> characters within the same interface. That will be much welcome too. >> >> On Sun, 15 Mar 2020 at 19:55, bawolff <[email protected]> wrote: >> >>> Consider using >>> https://www.mediawiki.org/wiki/Extension:Scribunto/Lua_reference_manual#mw.loadData >>> , keeping in mind that lua isn't really made with the usecase of huge data >>> tables in mind, so there might be limits you run into if your data is >>> really big. >>> >>> -- >>> Bawolff >>> >>> On Sun, Mar 15, 2020 at 2:13 PM Deryck Chan <[email protected]> >>> wrote: >>> >>>> Hello Ambassadors - This technical question may be relevant to multiple >>>> (particularly CJK) language communities so I'm asking it here. >>>> >>>> What is the advice for writing a Lua script that needs to look up data >>>> from a big table (~10k rows at first deployment, potentially increasing in >>>> the future)? Does one hard-code the data into a Lua script, or is there a >>>> recommended data structure for storing those? >>>> >>>> The design problem at hand is that the Cantonese Wikipedia wants to >>>> re-sort articles by Jyutping rather than Unicode. This will probably >>>> involve automating the generation of Jyutping phonetic guides by looking up >>>> the Jyutping transcription of common Chinese characters using a Lua module. >>>> Where do we store the data? >>>> >>>> If another wiki has done similar things, we'd be interested in sharing >>>> the infrastructure. >>>> >>>> Deryck >>>> On behalf of the Cantonese Wikipedia community >>>> >>>> _______________________________________________ >>>> Wikitech-ambassadors mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors >>>> >>> _______________________________________________ >>> Wikitech-ambassadors mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors >>> >> _______________________________________________ >> Wikitech-ambassadors mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors >> >
_______________________________________________ Wikitech-ambassadors mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors
