TL;DR: Unicode properties should reflect user expectations, not vendor choices.

Mark Davis ☕️ <m...@macchiato.com>:
> On Mon, Aug 22, 2016 at 11:26 PM, Christoph Päper 
> <christoph.pae...@crissov.de> wrote:
>> 1. it’s incomplete without an explicit neutral/ambiguous alternative and
> 
> ​As I said, people are actively investigating what to do about such cases. It 
> may be that the solution is to add ⚲ U+26B2 Neuter, but maybe not. We'll see 
> as they develop further.

Natively speaking a language which can explicitly mark any actor noun with a 
morpheme as female/feminine, but neither as neutral nor as male/masculine – a 
generic version of English ‘actor/actress’, ‘waiter/waitress’, 
‘prince/princess’ – and having intensely dealt with guidelines for corporate 
languages and public speech, I’ll assure you that a feminism/LGBT shitstorm 
will be heading for UTC and vendors if binary gender became mandatory for 
profession emojis. You should not approve Google’s and Apple’s ZWJ sequences 
without a neutral option. 

JFTR, I know that ☿ U+263F Mercury is also being proposed to denote 
androgynous/asexual emoji sequences.

>> 2. if they need `Emoji=yes` as a result, this must also be applied to a 
>> bunch of related characters.
> 
> ​As I said, ​that is absolutely not a criterion.

As I said, it absolutely should be to honor user expectations.

> If one were to apply that principle (…), then because we have one 
> playing-card emoji, we should make all of the playing cards be emoji; because 
> of one Mahjong tile, one would add all of them. And then add all the chess 
> pieces, and other game pieces.

It’s an open secret that all characters for game notations will have to become 
emojis sooner or later, regardless if one of them already had the emoji 
property. (I’m not sure I would have supported them being encoded in the first 
place, though, especially as lots of precomposed characters.) One big problem 
at the moment is, I think, that another user demand as anticipated by vendors 
is that every emoji font and UI should cover all of them.

> And because we have a few circled or squared ideograph and katakana emoji, 
> make all the others emoji. And there are squared or negative ASCII emoji, so 
> add all of the others as emoji.

I already addressed that strawman argument in my previous mail, regarding blood 
types. Precomposed characters with enclosing shapes are just there for 
compatibility reasons, so their Emoji property reflects compatibility needs.

> And alchemical symbols, and ... I suspect the transitive closure of this 
> process could end up marking essentially all Unicode characters with the 
> Emoji property.

No, but many, perhaps most of ‘General Category = Other_Symbol (So), Script = 
Common, Bidirectional Category = Other_Neutral (ON)’ probably and few others 
(e.g. with ‘Bidirectional Category = L’). That’s little more than 3000 
characters as of Unicode 9.0, which includes most existing emojis. Some of 
them, like reversed or rotated glyphs, would be simple to support for font 
designers, others could use identical emoji glyphs, e.g. lots of the 
Light/Medium/Bold/Heavy compatibility dingbat arrows, asterisks etc. Overall, 
the number of emojis (not counting Fitzpatrick and ZWJ variants) would less 
than double.

> The committee has and does consider related characters when looking at 
> properties. But this case was not an oversight. Those particular characters 
> were deliberately chosen. It is always possible to add other characters in 
> the future; it will depend on whether they are deemed to be necessary.

The problem lies within the “deemed to be necessary”.

> The purpose for character properties is to promote interoperability. That has 
> always been the case.

Sure, but for almost all characters and properties this has mostly been a 
descriptive approach, based upon existing texts. Whether a certain character 
will be included in emoji fonts and IMEs very strongly depends on whether it 
has the Emoji property (and how it reacts on VS-15/16). Unicode is hence 
wandering into prescriptive territory here. 

In the Rifle case, for instance, vendors have even removed emoji glyphs after 
the character, which was specifically proposed for emoji purposes like similar 
ones, became non-emoji late in the standardization process. On the other side, 
there are lots of legacy emojis that noone uses (or at least not with the 
originally intended meaning), but every emoji font supports. Since emojis are 
often input on mobile devices with some OSes being quite restrictive on 
installing alternative fonts or keyboards, this problem becomes even more 
serious.

> The goal of the emoji properties is to have structure that promotes the 
> highest degree of interoperability among the major implementations supporting 
> emoji.

What’s that, a “major implementation[] supporting emoji”? Is it a font, an OS 
component, a GUI picker, a soft keyboard, a text/input prediction algorithm, a 
text substitution feature …? You seem to be talking about the default setup on 
stock iOS (and Mac OS) and Android, maybe Windows (Phone). This effectively 
means that few US-based multi-billion-dollar companies – Apple, Google, 
Microsoft and Facebook basically – decide which character can be used as an 
emoji and which one cannot (while making money on “stickers” at the same time) 
and unlike Japanese telcos Docomo, KDDI and Softbank they increasingly do so 
with an agenda. This is a problem. The UTC could be the voice of the global 
multi-billion-head user base here, but, alas, it’s largely funded and staffed 
by the aforementioned companies and others like them.

You see, if I was an ancient Egyptian chiseling an ejaculating/peeing penis 𓂺 
or a 19th-century typographer drawing a heart-shaped exclamation mark ❣ or a 
late 20th-century Japanese engineer encoding brothels 🏩 as POIs in my mobile 
map application, these would be considered characters and become part of the 
Unicode standard in the 21st century. If there are millions or even billions of 
people who use pictograms for human genitalia in electronic textual 
communication today (as their ancestors had been doing in analog media for 
millenia), they have to rely on conventionalized linguistic 🐱 or graphical 🍆 
metaphors or they must abuse punctuation marks, digits and letters to “draw” 
body parts inline, ({|}) 3==D (.Y.) (_!_) (and *many* variants thereof), if 
they don’t want to resort to actual pictures, which most users are bad at 
drawing and thus must acquire elsewhere which means additional efforts, costs 
and legal issues. 

The chance of these pictographs being encoded as single, unambiguous (see 🍑) 
characters is basically nil due to the mentioned gatekeepers. Even if they ever 
made it into the standard, there would still be font vendors who would either 
not ship any glyph for such characters (see U+130BA etc.), only an inferior one 
(see 🥆) or, perhaps worst, a misleading/wrong one (see 🔫) and OS vendors may 
exclude them from input methods (see 🖕) or search engines would ignore them 
(see #🍆) on religious, political or other non-technical grounds.

And yes, I’m preparing a proper proposal for missing body part emojis 
nevertheless, but maybe someone beats me to it.

> It doesn't do any good for Unicode to mark a character as being emoji unless 
> that would result in it being widely deployed as such.

Sorry, but you got that backwards. There are some characters that have 
non-intuitive or unsystematic properties in Unicode, due to mistakes in the 
standardization process or bugs in widespread implementations. This may apply 
as well to some existing emojis (or all of them, for some people), which 
shouldn’t have been in i-mode phones in the first place. It does not apply, 
however, to future emojis, whether made from existing characters or new ones.

If a character is a pictogram that is less abstracted than sinograms and other 
signs used for writing proper, people will want to use it as an emoji (or at 
least find a use for it if it was available). They can only do so if fonts and 
software treat them as such. Most vendors will not make those do so unless the 
standard says they should, because only then they can expect the competition 
(i.e. potential partners in communication interchange) to do so, too.

A major part of standardization is to document existing (best) practice, but 
another is to synthesize general concepts from this and to develop new 
solutions based there upon for better interoperability and user experience in 
the future. It is failing the latter to deny some characters the Emoji property 
on arbitrary grounds (incl. demands of high-profile stakeholders) or not 
including tabooish characters.

> So the committee has to consider carefully what implementations will do. That 
> is nothing new; we have to consider carefully what the impact of any change 
> in property (such as Line_Break) will do in implementations. 

🍏 What major implementers want.
🍊 Effect of change on (existing) implementations.

> You can certainly propose (…), that any particular set of additional 
> characters should get the Emoji property, and try to make a case for it. 

Will do, but I’m trying to find out here beforehand whether I’m just wasting my 
time and everyone else’s, because I’m afraid that could indeed be the case.

> But I'd advise you to make a convincing case for your proposal — without 
> using grounds that would apply to hundreds or thousands of other characters. 
> In particular, you should address the question — for each of those characters 
> — of whether there is a strong expectation that it would be frequently used.

That’s trying to scare away useful input from small and independent parties. 
The Unicode process is good at that, but at least it allows for it, unlike many 
other standardization bodies.

Sorry, this got long.

Reply via email to