> On 4 Jun 2015, at 10:59 am, David Starner <[email protected]> wrote:
> 
> On Wed, Jun 3, 2015 at 5:46 PM Chris <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> I personally think emoji should have one, single definitive representation 
> for this exact reason.
> 
> Then you want an image. I don't see what's hard about that.


I already explained why an image and/or HTML5 is not a character. I’ll repeat 
again. And the world of characters is not limited to emoji.

1. HTML5 doesn’t separate one particular representation (font, size, etc) from 
the actual meaning of the character. So you can’t paste it somewhere and expect 
to increase its point size or change its font.
2. It’s highly inefficient in space to drop multi-kilobyte strings into a 
document to represent one character.
3. The entire design of HTML has nothing to do with characters. So there is no 
way to process a string of characters interspersed with HTML elements and know 
which of those elements are a “character”. This makes programatic manipulation 
impossible, and means most computer applications simply will not allow HTML in 
scenarios where they expect a list of “characters”.
4. There is no way to compare 2 HTML elements and know they are talking about 
the same character. I could put some HTML representation of a character in my 
document, you could put a different one in, and there would absolutely no way 
to know that they are the same character. Even if we are in the same community 
and agree on the existence of this character.
5. Similarly, there is no way to search or index html elements. If a HTML 
document contained an image of a particular custom character, there would be no 
way to ask google or whatever to find all the documents with that character. 
Different documents would represent it differently. HTML is a rendering 
technology. It makes things LOOK a particular way, without actually ENCODING 
anything about it. The only part of of HTML that is actually searchable in a 
deterministic fashion is the part that is encoded - the unicode part.


>  
> The community interested in tony the tiger can make decisions like that. 
> 
> That is a hell of a handwave. In practice, you've got a complex decision 
> that's always going to be a bit controversial, and one a decision that most 
> communities won't bother trying to make.

Apparently the world makes decisions all the time without meeting in committee. 
Strange but true. It’s called making a decision. Facebook have created a lot of 
emoji characters without consulting any committee and it seems to work fine, 
albeit restricted to the facebook universe because of a lack of a standard.

> 
>  
> You can’t know because they’re images.
> 
> You can't know because the only obvious equivalence relation is exact image 
> identity. 

Because… there is no standard!! If facebook wants to define 2 emoji images, 
maybe one is bigger than the other, and yet basically the same, to mean the 
same thing, then that would be their choice. Since I expect they have a lot of 
smart people working there, I expect it would work rather well. Just like 
Microsoft issues courier fonts in different point sizes and we all feel they 
have made that work fairly well.

You seem to be arguing the nonsense position that if someone for example, made  
a snowflake glyph slightly different to the unicode official one, that it is 
wrong. That of course is nonsense. People can make sensible decisions about 
this without the unicode committee.


> 
> You can’t iterate over compressed bits. You can’t process them.
> 
> Why not? In any language I know of that has iterators, there would be no 
> problem writing one that iterates over compressed input. If you need to 
> mutate them, that is hard in compressed formats, but a new CPU can store War 
> in Peace in the on-CPU cache.  

You can’t do it because no standard library, programming language, or operating 
system is set up to iterate over characters of compressed data. So if you want 
to shift compressed bits around in your app, it will take an awful lot of work, 
and the bits won’t be recognised by anyone else.

Now if someone wants to define the next version of unicode to be a compressed 
format, and every platform supports that with standard libraries, computer 
languages etc, then fine that could work.

Yet again I point out, lots of things MIGHT be possible in the real world IF 
that is how a standard is formulated. But all the chatter about this or that 
technology is pie in the sky without that standard.

Reply via email to