I.

    Which and where?

Section 3.7.1 Simplified and Traditional Chinese Variants talks about converting between Simplified and Traditional Chinese.
You wrote this

        http://www.unicode.org/reports/tr38/ does a good summary of
        the possibilities.

in response to my inquiry about "examples of meaning-divergent z-variant words in modern Mandarin" and appropriate "algorithms and data structures". Also, the Unihan database doesn't provide collocational data for T/S conversion.


II.

simplification is also found in for example Japanese CJK ideographs which is documented
Contextual conversion (and shifting/"transposition") is essentially not an issue in this context, even though you have an odd case of deviation here and there.

Some dialects such as Cantonese are quite well documented
[and]
There is an increased interest in such things in recent years. One persons 'hand-tuned' of today can become the basis of a standard of tomorrow.

1a. I'd say I have a decent grasp of the topic of lexical variation for written Cantonese, based on a decent amount of fieldwork. (While we're at it, I also know at least one researcher with an interest in standardization of Cantonese spelling.) I'm certain that lexical variation in Cantonese is not well-documented, though there are a bunch of sources from which you can scrap your own thing together. 1b. Keep in mind that most materials in electronic form (originally written in this form or digitized) don't use the "best" character choices – needless to say it's gotta be even truer for other Sinitic languages. 2. This is entirely unrelated to the question of whether one can or should describe simplified characters as "abbreviated". There is a connection to your statement about things being on a sliding scale (you used the word "relative"), but for Cantonese it's more like this translates into a lot of inconsistency between using genuine C spelling, a M substitute, a C-based phonetic transcription, ad-hoc usage using the mouth radical or a prefixed roman "o", an English-based informal transcription using Latin letters, and avoidance. Whether this is electronically manageable in principle depends on whether you include entirely romanized blogs (which I wouldn't recommend), but – in any case – anything other than liberal QE (query expansion) will /not/ work. (I might previously have misused the word "folding" to mean "conversion".) 3. Other Sinitic languages are essentially not at all standardized (we're talking Chinese characters here, not romanizations). Last time I checked it seemed like Taiwanese is a total mess, and Shanghainese has a (mainland-CN) researcher who is (still) writing a dictionary to actually find or document written representations of all syllable-"morphemes" to capture all of SHnese. The best SHnese textbook was published a couple of years ago in HK and uses traditional characters (!) to represent modern SHnese.

Stephan

Reply via email to