Re: Private Use areas
On Wednesday, 22 August 2018, Mark E. Shoulson via Unicode < unicode@unicode.org> wrote: > On 08/21/2018 02:03 PM, Ken Whistler via Unicode wrote: > >> >> > Best we can do is shout loudly at OpenType tables and hope to cram in > behavior (or at least appearance, which is more likely all we can get) that > vaguely resembles what we're after. And that's not SO awful, given what > we're dealing with. > >> >> At the moment I am looking at implementing three unencoded Arabic characters in the PUA. For the foreseeable future OpenType is a non-starter, so I will look at implementing them in Graphite tables in a font. Andrew -- Andrew Cunningham lang.supp...@gmail.com
Re: Private Use areas
On 08/21/2018 02:03 PM, Ken Whistler via Unicode wrote: On 8/21/2018 7:56 AM, Adam Borowski via Unicode wrote: On Mon, Aug 20, 2018 at 05:17:21PM -0700, Ken Whistler via Unicode wrote: On 8/20/2018 5:04 PM, Mark E. Shoulson via Unicode wrote: Is there a block of RTL PUA also? No. Perhaps there should be? This is a periodic suggestion that never goes anywhere--for good reason. (You can search the email archives and see that it keeps coming up.) Presuming that this question was asked in good faith... Yeah, I know there has been talk about such things, and I also knew that whether or not there was an RTL block (which I did not remember for certain), there weren't going to be any *changes* in the PUA, and we were going to have to make do with what there was. There's no way to anticipate all the possible properties people would want in the PUA, though I remember thinking it was probably wrong to make the PUA *strongly* LTR; I know there's a not-strongly flavor too. Best we can do is shout loudly at OpenType tables and hope to cram in behavior (or at least appearance, which is more likely all we can get) that vaguely resembles what we're after. And that's not SO awful, given what we're dealing with. As I see it, the only feasible way for people to get specialized behavior for PUA ranges involves first ceasing to assume that somehow they can jawbone the UTC into *standardizing* some ranges for some particular use or another. That simply isn't going to happen. People who assume this is somehow easy, and that the UTC are a bunch of boneheads who stand in the way of obvious solutions, do not -- I contend -- understand the complicated interplay of character properties, stability guarantees, and implementation behavior baked into system support libraries for the Unicode Standard. The whole point of the PUA is that it *isn't* standardized (by the UTC). It might have been nice to make some more varied choices of things that couldn't be left unspecified, but you're still going to wind up with "but there aren't any PUA codepoints that are JUST what I need!" And, as said, it's too late now. ~mark
Re: Private Use areas
On Tue, Aug 21, 2018 at 3:02 PM Doug Ewell via Unicode wrote: > Ken Whistler wrote: > > > The way forward for folks who want to do this kind thing is: > > > > 1. Define a *protocol* for reliable interchange of custom character > > property information about PUA code points. > > I've often thought that would be a great idea. You can't get to steps 2 > and 3 without step 1. I'd gladly participate in such a project. > As would I.
Re: Private Use areas
Ken Whistler wrote: > The way forward for folks who want to do this kind thing is: > > 1. Define a *protocol* for reliable interchange of custom character > property information about PUA code points. I've often thought that would be a great idea. You can't get to steps 2 and 3 without step 1. I'd gladly participate in such a project. -- Doug Ewell | Thornton, CO, US | ewellic.org
Re: Private Use areas
On Tue, Aug 21, 2018 at 11:03:41AM -0700, Ken Whistler via Unicode wrote: > > On 8/21/2018 7:56 AM, Adam Borowski via Unicode wrote: > > On Mon, Aug 20, 2018 at 05:17:21PM -0700, Ken Whistler via Unicode wrote: > > > On 8/20/2018 5:04 PM, Mark E. Shoulson via Unicode wrote: > > > > Is there a block of RTL PUA also? > > > No. > > Perhaps there should be? > > This is a periodic suggestion that never goes anywhere--for good reason. > (You can search the email archives and see that it keeps coming up.) > > Presuming that this question was asked in good faith... Oif, looks like mere months of inattentive lurking are not enough (the thread I got pointed to was from 2011). Apologies. > > or perhaps by allocating a new range elsewhere. > See: > > https://www.unicode.org/policies/stability_policy.html > > The General_Category property value Private_Use (Co) is immutable: the set > of code points with that value will never change. > > That guarantee has been in place since 1996, and is a rule that binds the > UTC. So nope, sorry, no more PUA ranges. Right. > The way forward for folks who want to do this kind thing is: > > 1. Define a *protocol* for reliable interchange of custom character property > information about PUA code points. [...] > And if the goal for #3 is to get some *system* implementer to support the > protocol in widespread software, then before starting any of #1, #2, or #3, > you had better start instead with: > > 0. Create a consortium (or other ongoing organization) with a 10-year time > horizon and participation by at least one major software implementer, to > define, publicize, and advocate for support of the protocol. Heh, good point. I wonder, perhaps a long-lived consortium tasked with assigning properties to characters already exists? So your answer _does_ provide a way to go: any PUA use that's no longer private, or any problem someone has with character properties, should go through official channels here instead of inventing an own standard. With my existing hats on (Debian fonts team member, and someone who messes with terminals in general) I already have two such itches to scratch. Thus, it sounds like I should do the research, prepare a write-up, and then come back to harass you folks with inane questions. Inventing new solutions that work around instead of with you is a bad idea... Meow! -- ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ
Re: Private Use areas
On Tue, 21 Aug 2018 11:03:41 -0700 Ken Whistler via Unicode wrote: > On 8/21/2018 7:56 AM, Adam Borowski via Unicode wrote: > Really? Suppose someone wants to implement a bicameral script in PUA. > They would need case mappings for that, and how would those be > "better represented in the font itself"? Or how about digits? Would > numeric values for digits be "better represented in the font itself"? > How about implementation of punctuation? Would segmentation > properties and behavior be "better represented in the font itself"? The least intrusive way of defining the meaning of a graphic (sensu lato) character is by a font, in a very wide sense that would interpret a Unicode code chart as a font. Without a font in this sense, normal characters in the PUA have no meaning. If one insists on a font to have an interpretation, then: (1) PUA characters in plain text are meaningless - I believe that's pretty much the position now. (2) Different schemes can co-exist, even within the same formatted document, by having different formats. This is the case now. It then makes sense to store the properties in the font, which needs to be saved with or in the document for the document to continue to make sense. Casing and digits are luxuries. Are we not told that searching should be done by collation? We then do not need case-folding! Interpreting the preferred representation of Roman numerals does not use Unicode properties beyond the approximate principle of one character, one codepoint. As to segmentation, my understanding was that there were no characters available to indicate word boundaries in scriptio continua; the closest one has is line-breaking suggestions. If my memory serves me right, SIL Graphite fonts can hold line-breaking information. Richard.
Re: Private Use areas
On Tue, Aug 21, 2018 at 10:21 AM, Janusz S. Bień via Unicode < unicode@unicode.org> wrote: > I think PUA users should provide the > properties of the characters used in a form analogical to the Unicode > itself, and the software should be able to use this additional > information. > I already provide this myself for my uses of the PUA as well as the CSUR and any vendor-specific agreements I can find: http://www.kreativekorp.com/charset/PUADATA/ Of course there is no way to get software to use this information. I have entertained the idea of being able to embed this information into the font itself as OpenType tables, e.g.: PUAB -> Blocks.txt PUAC -> CaseFolding.txt PUAW -> EastAsianWidth.txt PUAL -> LineBreak.txt PUAD -> UnicodeData.txt I've actually invented table names for the majority of UCD files, but those are probably the most relevant. The table names for the more obscure files get rather... creative, e.g.: PUA[ -> BidiBrackets.txt PUA] -> BidiMirroring.txt That alone may get some people to think twice about this idea. :P
Re: Private Use areas
On 8/21/2018 7:56 AM, Adam Borowski via Unicode wrote: On Mon, Aug 20, 2018 at 05:17:21PM -0700, Ken Whistler via Unicode wrote: On 8/20/2018 5:04 PM, Mark E. Shoulson via Unicode wrote: Is there a block of RTL PUA also? No. Perhaps there should be? This is a periodic suggestion that never goes anywhere--for good reason. (You can search the email archives and see that it keeps coming up.) Presuming that this question was asked in good faith... What about designating a part of the PUA to have a specific property? The problem with that is that assigning *any* non-default property to any PUA code point would break existing implementations' assumptions about PUA character properties and potentially create havoc with existing use. Only certain properties matter enough: That is an un-demonstrated assertion that I don't think you have thought through sufficiently. * wide * RTL RTL is not some binary counterpart of LTR. There are 23 values of Bidi_Class, and anyone who wanted to implement a right-to-left script in PUA might well have to make use of multiple values of Bidi_Class. Also, there are two major types of strong right-to-leftness: Bidi_Class=R and Bidi_Class=AL. Should a "RTL PUA" zone favor Arabic type behavior or non-Arabic type behavior? * combining Also not a binary switch. Canonical_Combining_Class is a numeric value, and any value but ccc=0 for a PUA character would break normalization. Then for the General_Category, there are three types of "marks" that count as combining: gc=Mn, gc=Mc, gc=Me. Which of those would be favored in any PUA assignment? as most others are better represented in the font itself. Really? Suppose someone wants to implement a bicameral script in PUA. They would need case mappings for that, and how would those be "better represented in the font itself"? Or how about digits? Would numeric values for digits be "better represented in the font itself"? How about implementation of punctuation? Would segmentation properties and behavior be "better represented in the font itself"? This could be done either by parceling one of existing PUA ranges: planes 15 and 16 are virtually unused thus any damage would be negligible; That is simply an assertion -- and not the kind of assertion that the UTC tends to accept on spec. I rather suspect that there are multiple participants on this email list, for example, who *do* have implementations making extensive use of Planes 15/16 PUA code points for one thing or another. or perhaps by allocating a new range elsewhere. See: https://www.unicode.org/policies/stability_policy.html The General_Category property value Private_Use (Co) is immutable: the set of code points with that value will never change. That guarantee has been in place since 1996, and is a rule that binds the UTC. So nope, sorry, no more PUA ranges. Meow! Grrr! ;-) As I see it, the only feasible way for people to get specialized behavior for PUA ranges involves first ceasing to assume that somehow they can jawbone the UTC into *standardizing* some ranges for some particular use or another. That simply isn't going to happen. People who assume this is somehow easy, and that the UTC are a bunch of boneheads who stand in the way of obvious solutions, do not -- I contend -- understand the complicated interplay of character properties, stability guarantees, and implementation behavior baked into system support libraries for the Unicode Standard. The way forward for folks who want to do this kind thing is: 1. Define a *protocol* for reliable interchange of custom character property information about PUA code points. 2. Convince more than one party to actually *use* that protocol to define sets of interchangeable character property definitions. 3. Convince at least one implementer to support that protocol to create some relevant interchangeable *behavior* for those PUA characters. And if the goal for #3 is to get some *system* implementer to support the protocol in widespread software, then before starting any of #1, #2, or #3, you had better start instead with: 0. Create a consortium (or other ongoing organization) with a 10-year time horizon and participation by at least one major software implementer, to define, publicize, and advocate for support of the protocol. (And if you expect a major software implementer to participate, you might need to make sure you have a business case defined that would warrant such a 10-year effort!) --Ken
Re: Private Use areas
2011 Thread: https://www.unicode.org/mail-arch/unicode-ml/y2011-m08/0124.html Please read in particular these two: - https://www.unicode.org/mail-arch/unicode-ml/y2011-m08/0174.html - https://www.unicode.org/mail-arch/unicode-ml/y2011-m08/0212.html (tl;dr: 1. the PUA set is fixed, 2. being private, the properties may be overridable by conformant implementations.) On Mon, Aug 20, 2018 at 5:17 PM Ken Whistler via Unicode < unicode@unicode.org> wrote: > > > On 8/20/2018 5:04 PM, Mark E. Shoulson via Unicode wrote: > > Is there a block of RTL PUA also? > > No. > > --Ken >
Re: Private Use areas
On Tue, Aug 21 2018 at 16:56 +0200, unicode@unicode.org writes: > On Mon, Aug 20, 2018 at 05:17:21PM -0700, Ken Whistler via Unicode wrote: >> On 8/20/2018 5:04 PM, Mark E. Shoulson via Unicode wrote: >> > Is there a block of RTL PUA also? >> >> No. > > Perhaps there should be? > > What about designating a part of the PUA to have a specific property? Only > certain properties matter enough: > * wide > * RTL > * combining > as most others are better represented in the font itself. > > This could be done either by parceling one of existing PUA ranges: planes 15 > and 16 are virtually unused thus any damage would be negligible; or perhaps > by allocating a new range elsewhere. I don't think it's a good idea. I think PUA users should provide the properties of the characters used in a form analogical to the Unicode itself, and the software should be able to use this additional information. Best regards Janusz -- , Janusz S. Bien emeryt (emeritus) https://sites.google.com/view/jsbien
Re: Private Use areas
On Mon, Aug 20, 2018 at 05:17:21PM -0700, Ken Whistler via Unicode wrote: > On 8/20/2018 5:04 PM, Mark E. Shoulson via Unicode wrote: > > Is there a block of RTL PUA also? > > No. Perhaps there should be? What about designating a part of the PUA to have a specific property? Only certain properties matter enough: * wide * RTL * combining as most others are better represented in the font itself. This could be done either by parceling one of existing PUA ranges: planes 15 and 16 are virtually unused thus any damage would be negligible; or perhaps by allocating a new range elsewhere. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ What Would Jesus Do, MUD/MMORPG edition: ⣾⠁⢰⠒⠀⣿⡁ • multiplay with an admin char to benefit your mortal [Mt3:16-17] ⢿⡄⠘⠷⠚⠋⠀ • abuse item cloning bugs [Mt14:17-20, Mt15:34-37] ⠈⠳⣄ • use glitches to walk on water [Mt14:25-26]
Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process)
Rebecca Bettencourt wrote, > Why don't we just get Blissymbolics encoded as it is? The Pipeline still has the Everson proposal from 1998, but Blissymbols are still in the Pipeline. Scripts Encoding Initiative ( http://linguistics.berkeley.edu/sei/ ) page, http://linguistics.berkeley.edu/sei/scripts-not-encoded.html shows Blissymbols and links the same proposal. Blissymbolics Communication International, http://www.blissymbolics.org/ will likely produce the next proposal. Both Scripts Encoding Initiative and Blissymbolics Communication International depend upon funding.
Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process)
On 8/21/2018 1:01 AM, Julian Bradfield via Unicode wrote: On 2018-08-20, Mark E. Shoulson via Unicode wrote: Moreover, they [William's pronoun symbols] are once again an attempt to shoehorn Overington's pet project, "language-independent sentences/words," which are still generally deemed out of scope for Unicode. I find it increasingly hard to understand why William's project is out of scope (apart from the "demonstrate use first, then encode" principle, which is in any case not applied to emoji), when emoji are language-independent words - or even sentences: the GROWING HEART emoji is (I presume) supposed to be a language-independent way of saying "I love you more every day". Which seems rather more fatuous as a thing to put in a writing-systems standard than the things I think William would want. Not that I want to hear any more about William's unmentionables; I just wish emoji were equally unmentionable. Unicode is descriptive, not prescriptive (or tries to be). In other words, it generally tries to track what people use in writing (including "have used" in the case of obsolete/historic characters and scripts). Focusing on abstract commonalities misses the point: some things are in use by large, active user communities that have "voted with their feet" to treat these on the same footing as "characters". Being descriptive means that Unicode necessarily will (have to) follow. It does not mean that other items that are formally of a similar category should necessarily be treated the same way: they are ideas and not part of a system that is already in near universal use. A./
Re: Private Use areas (was: Re: Thoughts on working with the Emoji Subcommittee (was ...))
On Tue, 21 Aug 2018 08:53:18 +0800 via Unicode wrote: > On 2018-08-21 08:04, Mark E. Shoulson via Unicode wrote: > > Still, maybe it > > doesn't really matter much: your special-purpose font can treat any > > codepoint any way it likes, right? > Not all properties come from the font. For example a Zhuang character > PUA font, which supplements CJK ideographs, does not rotate > characters 90 degrees, when change from RTL to vertical display of > text. Isn't that supposed to be treated by an OpenType feature such as 'vert'? Or does the rendering stack get in the way? However, one might need reflowing text to be about 40% WJ. Richard.
Re: Thoughts on working with the Emoji Subcommittee (was Re: Thoughts on Emoji Selection Process)
On 2018-08-20, Mark E. Shoulson via Unicode wrote: > Moreover, they [William's pronoun symbols] are once again an attempt to > shoehorn Overington's pet > project, "language-independent sentences/words," which are still > generally deemed out of scope for Unicode. I find it increasingly hard to understand why William's project is out of scope (apart from the "demonstrate use first, then encode" principle, which is in any case not applied to emoji), when emoji are language-independent words - or even sentences: the GROWING HEART emoji is (I presume) supposed to be a language-independent way of saying "I love you more every day". Which seems rather more fatuous as a thing to put in a writing-systems standard than the things I think William would want. Not that I want to hear any more about William's unmentionables; I just wish emoji were equally unmentionable. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: Unicode 11 Georgian uppercase vs. fonts
(from 2018-07-27) > Michael Everson responded, > >>> If members of the Georgian user community want to consider this a stylistic >>> difference, they are free to do so. >> >> It isn’t a stylistic difference. It is a different use of capital letters >> than Latin, Cyrillic and other scripts use them. suppose that english was written with a bicameral script, but english users only used the upper case letters for emphasis. in other words, personal names (like bela lugosi), place names (like bechuanaland), and book titles (like "the bridge over the river kwai") would always be in lower case. if someone needed to emphasize something by SHOUTING, they would use all-caps to make this stylistic distinction. if english users called upper case "harcourt" and lower case "fenton", there would be no earthly reason for them to consider switching from fenton to harcourt to be anything other than a stylistic difference. along comes a consortium with script experts and computer encoding experts who rightfully determine that the difference between harcourt and fenton is actually a casing difference, even though the english writing system does not actually use casing in a manner consistent with other bicameral scripts. so the consortium, tasked with breaking down elements of text for computer entry, exchange, and storage, encodes the english script as a casing script. would that action by the consortium alter my perception (as a typical member of the english user community) that the difference between harcourt and fenton is simply stylistic? HECK, no! the same applies to georgian. or any script. whatever the consortium does for computer text processing purposes should NEVER be interpreted as an effort to make the users change their perceptions of their OWN writing systems. we've been through this kind of thing before, with tamil as a notorious example. best regards, james kass