Re: Aw: Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process)
On 08/23/2018 06:48 AM, Asmus Freytag (c) via Unicode wrote: On 8/23/2018 3:28 AM, "Jörg Knappen" wrote: Asmus, I know your style of humor, but to keep it straight: All known human languages, even Piraha, have pronouns for "I" and "you". And languages like Japanese, tend to use them - mostly not. Even if the concepts are known, and can be named, there are deep differences across languages concerning the need or conventions for demarcating them with words in any given context. Replacing words by symbols is not going to fix this - the only way to get a 'universal' system of symbolic expression is to invent a new language, with its own conventions for use of these symbols in any given context. It isn't like replacing words with symbols hasn't been tried... I think Francis Lodwick had a "universal symbology" like this in the works in the 1600s. ~mark
Re: Aw: Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process)
Still, pronouns may be universal, but their features aren't... Pronouns in Japanese are not a closed class, and it is not uncommon to use a person's name/title instead of "you". Happens in English and other languages too, with extremely formal speech, even down to conjugating with 3rd-person verb forms. (it's really cool to see the mid-sentence back-and-forth shifting in Biblical Hebrew, e.g. Genesis chapter 44.) All of which is to say, as Asmus did, that even "I" and "you" are not interchangeable pieces between languages, easily symbolized by a single "fits-all-languages" placeholder. ~mark On 08/23/2018 06:28 AM, "Jörg Knappen" via Unicode wrote: Asmus, I know your style of humor, but to keep it straight: All known human languages, even Piraha, have pronouns for "I" and "you". --Jörg Knappen *Gesendet:* Montag, 20. August 2018 um 16:20 Uhr *Von:* "Asmus Freytag via Unicode" *An:* unicode@unicode.org *Betreff:* Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process) What about languages that don't have or don't use personal pronouns. Their speakers might find their use odd or awkward. The same for many other grammatical concepts: they work reasonably well if used by someone from a related language, or for linguists trained in general concepts, but languages differ so much in what they express explicitly that if any native speaker transcribes the features that are exposed (and not implied) in their native language it may not be what a reader used to a different language is expecting to see. A./
Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process)
I think Blissymbols could be a separate, well-defined script in Unicode because they are already more or less well defined by their respective groups. This community of interest can lobby for these implementations as a whole instead of multiple individuals separately. Emoji were born in quite a different way and are in no way as well defined as Blissymbols are for example. There is no self-governing forum of people to discuss the future of emoji and forthcoming additions. Obviously, because they gained international attention just as they were added to Unicode-Standard but also maybe because "working with the Emoji Subcommittee" is rather hard. The conversation about Blissymbols made me think about a solution on how to solve the current communication problem, although it might be a bit radical: Why not remove the authority to propose new emojis from the ESC and give it to a dedicated, public Emoji-Community. Such a community could formulate additional guidelines for upcoming emojis, draft roadmaps and send a quarterly proposal to the ESC for individual approval. Unicode Members could still express ideas and exercise power through participating in the community and appointing people to the ESC. [image: diagram.png] This change would remove pressure and workload from the ESC while retaining most of the control, especially the last word, but the Emoji-Standart would benefit from a dedicated community. I'm just putting this out there. What are your thoughts on this? Do you think this is unreasonable, or achievable? Julian On Tue, Aug 21, 2018 at 3:25 PM James Kass via Unicode wrote: > Rebecca Bettencourt wrote, > > > Why don't we just get Blissymbolics encoded as it is? > > The Pipeline still has the Everson proposal from 1998, but Blissymbols > are still in the Pipeline. > > Scripts Encoding Initiative > ( http://linguistics.berkeley.edu/sei/ ) > page, > http://linguistics.berkeley.edu/sei/scripts-not-encoded.html > shows Blissymbols and links the same proposal. > > Blissymbolics Communication International, > http://www.blissymbolics.org/ > will likely produce the next proposal. > > Both Scripts Encoding Initiative and Blissymbolics Communication > International depend upon funding. >
Emacs Verbose Character Entry (was Private Use Areas)
On Thu, 23 Aug 2018 21:47:03 +0200 "Janusz S. Bień via Unicode" wrote: > My needs are very simple, for example C-x 8 Return LATIN CAPITAL > LETTER A WITH MACRON AND BREVE [MUFI] should yield the character with > the code E010. I can provide the list of names and codes. While it should obviously yield, if anything, or for 'LATIN CAPITAL LETTER A WITH MACRON AND BREVE', it would probably be more important to recognise formal aliases, such as 'LAO LETTER LO' for the input of the Lao letter lo ling (U+0EA5 LAO LETTER LO LOOT), not to be be confused with the Lao letter lo lot (a.k.a. ro rot), U+0EA5 LETTER LO LING. For , I prefer to type "A\_M_X", but then I learnt XSAMPA. Richard.
Re: Private Use areas
On Thu, 23 Aug 2018 20:34:20 +0200 "Janusz S. Bień via Unicode" wrote: > This is a typical but IMHO obsolete perspective. Fonts are for > *rendering*, new characters and variants are more and more often > needed for *input* of real life old texts with sufficient precision. If we're talking about glyphs which don't actually correspond to new characters, then that sounds like a good case for private use variation selectors. To quote Tully, "Abusus non tollit usum". Richard.
Re: Private Use areas
On Thu, Aug 23 2018 at 22:17 +0300, e...@gnu.org writes: >> Date: Thu, 23 Aug 2018 20:30:52 +0200 >> Cc: Richard Wordingham >> From: "Janusz S. Bień via Unicode" >> >> >> and in Emacs - to my disappointed it looks like the Unicode data are >> >> set at the compile time, but perhaps this can be negotiated with the >> >> developers. >> > >> > Can you be more specific? >> >> I often search characters by name with C-x 8 Return. I would like to use >> it also for MUFI characters, I have already the name list (the example >> directory at https://bitbucket.org/jsbien/unihistext/). I haven't looked >> very closely into the problem and don't remember now the details, but my >> impression was that it's not simple. > > What is "it" in the last sentence? IOW, what is not simple about that > with Emacs? I'm very glad you join the discussion. My needs are very simple, for example C-x 8 Return LATIN CAPITAL LETTER A WITH MACRON AND BREVE [MUFI] should yield the character with the code E010. I can provide the list of names and codes. > > It is true that the Unicode related data is produced at build time, > but only some of that is actually recorded in the Emacs binary, the > rest is loaded upon demand. But all the data is stored in data > structures that are mutable, given some Lisp programming. I never was fluent in Lisp programming and by now I forgot almost everything I knew, so it's not a task for me. I was thinking about submitting a feature request, but I forgot also the proper procedures to do it. Moreover I had the impression that I'm the only person who needs it... > > (It is not clear to me which part of the Unicode data you would like > to change; are you talking about adding characters to the list of > those defined by Unicode? If you are using the PUA codepoints, it's > possible that you will need to update Emacs's notion of PUA as well.) Yes, I would like the PUA codepoints to be handled analogically as the proper ones. What do you mean by Emacs's notion of PUA? Best regards Janusz -- , Janusz S. Bien emeryt (emeritus) https://sites.google.com/view/jsbien
Re: Private Use areas
On Thu, Aug 23, 2018 at 5:10 AM, Janusz S. Bień wrote: > > I already provide this myself for my uses of the PUA as well as the > > CSUR and any vendor-specific agreements I can find: > > > > http://www.kreativekorp.com/charset/PUADATA/ > > I would prefer to see the data in a repository, so others can can > comment and contribute. > That is actually my intent for the future. Though it's not quite ready yet: https://github.com/kreativekorp/charset/tree/master/puadata That's the data in a "pre-compiled" form; it's turned into a "proper" PUADATA directory using this script: https://github.com/kreativekorp/charset/blob/master/bin/build-public.py As for "any vendor-specific agreements", do MUFI and LINCUA qualify? > I certainly do want to see MUFI and LINCUA provided in this form, but I put them in a different category along with CSUR. I basically have three categories of PUA agreements: Fonts - PUA assignments specific to a font family, e.g. Constructium, Fairfax, Nishiki-teki, Quivira, Junicode, etc. Public - PUA agreements meant to be widely used, e.g. CSUR, UCSUR, MUFI, LINCUA, etc. Vendors - PUA assignments meant to be used by a single vendor or platform, e.g. Adobe, Apple, etc. but also Linux, MirOS, etc. Thank you for those links by the way. I had tried to find charts for MUFI in the past but had somehow been unsuccessful. > Of course there is no way to get software to use this information. > > What kind of software do you have in mind? > Unicode-related utilities, text editors to start with. You pretty much hit the nail on the head with uniname and emacs as examples. :)
Re: Private Use areas
On Thu, Aug 23 2018 at 17:26 +0100, unicode@unicode.org writes: > On Thu, 23 Aug 2018 17:39:15 +0200 > Philippe Verdy via Unicode wrote: > >> You make a confusion: I do not propose "hacking" existing codes, but >> instead adding new codes for private variations. It's then up to PUV >> sequence authors to choose an appropropriate base character that can >> have the properties they want to be inherited by the private-use >> variation sequence, or to choose a base character that will provide >> some reasonnable reading if rendererd as is (by renderers or fonts >> not implementing the pricate viaration sequence, give nthat they will >> also append a symbol for the PUV itself after the standard character). > > Variation sequences cannot be used to add new characters. Most PUA > characters are used to represent new characters. A > standard-conformant private variation sequence would generally achieve > the same effect as could be achieved by a font feature (typically one > of the cvxx, though possibly one of the ssxx), This is a typical but IMHO obsolete perspective. Fonts are for *rendering*, new characters and variants are more and more often needed for *input* of real life old texts with sufficient precision. Best regards Janusz -- , Janusz S. Bien emeryt (emeritus) https://sites.google.com/view/jsbien
Re: Private Use areas
On Thu, Aug 23 2018 at 17:11 +0100, unicode@unicode.org writes: > On Thu, 23 Aug 2018 14:10:35 +0200 > "Janusz S. Bień via Unicode" wrote: > >> What kind of software do you have in mind? >> >> I'm primarily interested in the locally developed programs >> >> https://bitbucket.org/jsbien/unihistext/ >> >> https://bitbucket.org/jsbien/fntsample-fork-with-ucd-comments/ > > It looks as though the security certificates are awry - has someone > forgotten to pay the protection money to the right people? (Firefox > objects with "The page you are trying to view cannot be shown because > the authenticity of the received data could not be verified.") I see no such problems with Firefox ESR 52.9.0 on Debian testing. Moreover the program reports that the certificate is valid till 04/21/2020. > >> and in Emacs - to my disappointed it looks like the Unicode data are >> set at the compile time, but perhaps this can be negotiated with the >> developers. > > Can you be more specific? I often search characters by name with C-x 8 Return. I would like to use it also for MUFI characters, I have already the name list (the example directory at https://bitbucket.org/jsbien/unihistext/). I haven't looked very closely into the problem and don't remember now the details, but my impression was that it's not simple. Best regards Janusz -- , Janusz S. Bien emeryt (emeritus) https://sites.google.com/view/jsbien
Re: Private Use areas
Le jeu. 23 août 2018 à 18:31, Richard Wordingham via Unicode < unicode@unicode.org> a écrit : > On Thu, 23 Aug 2018 17:39:15 +0200 > Philippe Verdy via Unicode wrote: > > > You make a confusion: I do not propose "hacking" existing codes, but > > instead adding new codes for private variations. It's then up to PUV > > sequence authors to choose an appropropriate base character that can > > have the properties they want to be inherited by the private-use > > variation sequence, or to choose a base character that will provide > > some reasonnable reading if rendererd as is (by renderers or fonts > > not implementing the pricate viaration sequence, give nthat they will > > also append a symbol for the PUV itself after the standard character). > > Variation sequences cannot be used to add new characters. Did you remember I did not speak about existing variation sequences ? Only about the new encocing do provite use variation sequences which do not have to obey the policy of exising VS, and whose purpose whould be to inherit most properties (notably direction, breaking, spacing, general category of another existing character). > Most PUA > characters are used to represent new characters. I did not speak as well about PUAs.
Re: Private Use areas
On Thu, 23 Aug 2018 17:39:15 +0200 Philippe Verdy via Unicode wrote: > You make a confusion: I do not propose "hacking" existing codes, but > instead adding new codes for private variations. It's then up to PUV > sequence authors to choose an appropropriate base character that can > have the properties they want to be inherited by the private-use > variation sequence, or to choose a base character that will provide > some reasonnable reading if rendererd as is (by renderers or fonts > not implementing the pricate viaration sequence, give nthat they will > also append a symbol for the PUV itself after the standard character). Variation sequences cannot be used to add new characters. Most PUA characters are used to represent new characters. A standard-conformant private variation sequence would generally achieve the same effect as could be achieved by a font feature (typically one of the cvxx, though possibly one of the ssxx), though using font features would be fiddlier and have more limited support, and variation sequences would facilitate data processing. Richard.
Re: Private Use areas
On Thu, 23 Aug 2018 14:10:35 +0200 "Janusz S. Bień via Unicode" wrote: > What kind of software do you have in mind? > > I'm primarily interested in the locally developed programs > > https://bitbucket.org/jsbien/unihistext/ > > https://bitbucket.org/jsbien/fntsample-fork-with-ucd-comments/ It looks as though the security certificates are awry - has someone forgotten to pay the protection money to the right people? (Firefox objects with "The page you are trying to view cannot be shown because the authenticity of the received data could not be verified.") > and in Emacs - to my disappointed it looks like the Unicode data are > set at the compile time, but perhaps this can be negotiated with the > developers. Can you be more specific? For Indic rearrangement I had to define syllables myself with definitions which I then added to composition-function-table. Unfortunately, I then hit the problem that I had to define Indic rearrangement myself, and OpenType fonts fall into several incompatible families, which is why I haven't released a general solution. My emacs kit for Tai Tham is given via http://www.wrdingham.co.uk/lanna/toolkit.html (a probable kinsman got the 'o'), but there are a lot of odds and ends that need sorting out. I would expect that you would be able to override any relevant 'compiler' settings via your Emacs start up file - I expect Eli Zaretski will be along soon with more details. Of course, you could always revert to the old tradition and recompile Emacs yourself - though it may need something like MinGW to compile for Windows. Richard.
Re: Private Use areas
You make a confusion: I do not propose "hacking" existing codes, but instead adding new codes for private variations. It's then up to PUV sequence authors to choose an appropropriate base character that can have the properties they want to be inherited by the private-use variation sequence, or to choose a base character that will provide some reasonnable reading if rendererd as is (by renderers or fonts not implementing the pricate viaration sequence, give nthat they will also append a symbol for the PUV itself after the standard character). Also I do not want to change anything to any existing variation sequences (using VS1 and so on) and their encoding policies, requiring a prior registration and standardisation. Le jeu. 23 août 2018 à 11:42, Richard Wordingham via Unicode < unicode@unicode.org> a écrit : > On Wed, 22 Aug 2018 11:58:58 +0200 > Philippe Verdy via Unicode wrote: > > > For now there's still no way to have variant sequences unless they are > > registered and standardized by Unicode but registration should be not > > needed (forbidden) for sequences containing PUV. > > I believe this scheme is no worse than hack encodings that using Latin > character codes for other characters. These schemes often work. > (Indeed, the currently best method of getting Tai Tham displayed as rich > text that I can find is to use a transliteration-type encoding and a > special font, though I can now get pretty close using the proper > character codes in the order laid down in the proposals.) > > The major problems I can see with appropriating variation sequences > are: > (1) It might be restricted to base characters - I have no > experimental evidence on whether this would happen. Fonts can happily > convert base characters to combining characters, though this works > best if Latin line-breaking rules take effect. > > (2) The appropriated variation sequence might be assigned a meaning - > but this is no worse than the general ambiguity of PUA characters. > > (3) Some base characters get special treatment. For example, I had > to change my transliteration scheme because hyphen-minus is treated > specially by MS Edge - I was using it as a digraph disjunctor - and > so clusters were not being formed. In this case, I would have come > unstuck as soon as line-wrapping started, so it was a bad choice anyway. > > Or are there significant renderers that deliberately ignore variation > selectors in unregistered, unstandardised variation sequences? I don't > recall any problems from when we were discussing variation > sequences for chess pieces. > > For supplementing a script, it might be best to start at > VARIATION-SELECTOR-256, and work down if need be with specialist > characters. > > Richard. >
Re: Private Use areas
On Tue, Aug 21 2018 at 11:23 -0700, unicode@unicode.org writes: > On Tue, Aug 21, 2018 at 10:21 AM, Janusz S. Bień via Unicode > wrote: > > I think PUA users should provide the > properties of the characters used in a form analogical to the Unicode > itself, and the software should be able to use this additional > information. > > I already provide this myself for my uses of the PUA as well as the > CSUR and any vendor-specific agreements I can find: > > http://www.kreativekorp.com/charset/PUADATA/ I would prefer to see the data in a repository, so others can can comment and contribute. As for "any vendor-specific agreements", do MUFI and LINCUA qualify? https://folk.uib.no/hnooh/mufi/ http://andron-typeforum.xobor.de/t10f13-Towards-a-linguistic-corporate-use-area-LINCUA.html > > Of course there is no way to get software to use this information. What kind of software do you have in mind? I'm primarily interested in the locally developed programs https://bitbucket.org/jsbien/unihistext/ https://bitbucket.org/jsbien/fntsample-fork-with-ucd-comments/ and in Emacs - to my disappointed it looks like the Unicode data are set at the compile time, but perhaps this can be negotiated with the developers. Best regards Janusz -- , Janusz S. Bien emeryt (emeritus) https://sites.google.com/view/jsbien
Re: Aw: Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process)
On 8/23/2018 3:28 AM, "Jörg Knappen" wrote: Asmus, I know your style of humor, but to keep it straight: All known human languages, even Piraha, have pronouns for "I" and "you". And languages like Japanese, tend to use them - mostly not. Even if the concepts are known, and can be named, there are deep differences across languages concerning the need or conventions for demarcating them with words in any given context. Replacing words by symbols is not going to fix this - the only way to get a 'universal' system of symbolic expression is to invent a new language, with its own conventions for use of these symbols in any given context. A./ --Jörg Knappen *Gesendet:* Montag, 20. August 2018 um 16:20 Uhr *Von:* "Asmus Freytag via Unicode" *An:* unicode@unicode.org *Betreff:* Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process) What about languages that don't have or don't use personal pronouns. Their speakers might find their use odd or awkward. The same for many other grammatical concepts: they work reasonably well if used by someone from a related language, or for linguists trained in general concepts, but languages differ so much in what they express explicitly that if any native speaker transcribes the features that are exposed (and not implied) in their native language it may not be what a reader used to a different language is expecting to see. A./
Aw: Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process)
Asmus, I know your style of humor, but to keep it straight: All known human languages, even Piraha, have pronouns for "I" and "you". --Jörg Knappen Gesendet: Montag, 20. August 2018 um 16:20 Uhr Von: "Asmus Freytag via Unicode" An: unicode@unicode.org Betreff: Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process) What about languages that don't have or don't use personal pronouns. Their speakers might find their use odd or awkward. The same for many other grammatical concepts: they work reasonably well if used by someone from a related language, or for linguists trained in general concepts, but languages differ so much in what they express explicitly that if any native speaker transcribes the features that are exposed (and not implied) in their native language it may not be what a reader used to a different language is expecting to see. A./
Re: Private Use areas
On Wed, 22 Aug 2018 11:58:58 +0200 Philippe Verdy via Unicode wrote: > For now there's still no way to have variant sequences unless they are > registered and standardized by Unicode but registration should be not > needed (forbidden) for sequences containing PUV. I believe this scheme is no worse than hack encodings that using Latin character codes for other characters. These schemes often work. (Indeed, the currently best method of getting Tai Tham displayed as rich text that I can find is to use a transliteration-type encoding and a special font, though I can now get pretty close using the proper character codes in the order laid down in the proposals.) The major problems I can see with appropriating variation sequences are: (1) It might be restricted to base characters - I have no experimental evidence on whether this would happen. Fonts can happily convert base characters to combining characters, though this works best if Latin line-breaking rules take effect. (2) The appropriated variation sequence might be assigned a meaning - but this is no worse than the general ambiguity of PUA characters. (3) Some base characters get special treatment. For example, I had to change my transliteration scheme because hyphen-minus is treated specially by MS Edge - I was using it as a digraph disjunctor - and so clusters were not being formed. In this case, I would have come unstuck as soon as line-wrapping started, so it was a bad choice anyway. Or are there significant renderers that deliberately ignore variation selectors in unregistered, unstandardised variation sequences? I don't recall any problems from when we were discussing variation sequences for chess pieces. For supplementing a script, it might be best to start at VARIATION-SELECTOR-256, and work down if need be with specialist characters. Richard.