> 
> Well, that's the rub, isn't it?
> 
> We (in IT) are still working pretty dang hard on the simpler problem, to wit:
> 
> There should be a way to put standard characters anywhere that characters 
> belong
> and have things "just work".
> 
> And even *that* is a hard problem that has taken over 25 years -- and is 
> still a work in
> progress.


Unicode is 2 things. (1) A binary format… the technology bit. (2) And the 
social part: agreeing what the characters should be.

(1) is, relatively speaking, super easy. Roughly speaking, 16 bit unique 
numbers in a row.  (2) is hard because coming to an agreement is hard.

What I’m saying is we can totally bypass (2) for many use cases if people had 
the power to make their own characters. Yes it is hard to meet in committee and 
agree on stuff. Don’t force people to do that. You do that by putting more work 
into (1), and less hand wringing about (2).


> See, the first barrier to getting anywhere with this goal is to get everybody 
> concerned
> with text in IT (or perhaps even worse, all the hundreds of millions of 
> people who
> *use* characters in their devices) to agree what a "custom character" is.

There is no need for such thing. Everybody knows roughly what the concept of a 
custom character is. What is needed is the technology to do it so that everyone 
can seamlessly enjoy it.

> And if
> the rollicking "discussions" underway about emoji have taught us much of 
> anything,
> it includes the fact that people do *not* all agree about what characters are 
> or
> what should be a candidate for "just working" -- or even what "just work" 
> might
> mean for them, in any case.

That’s because you’re immersed in (2), which is a different kind of problem. 
You don’t have to agree on details if everybody has the power to create new 
characters.

> So before declaring that your position is self-evidently correct about how 
> things
> should just work, it might be a good idea to put some real thought into how
> one would define and standardize the concept of a "custom character" 
> sufficiently
> precisely that there would be a snowball's chance in hell that all the 
> implementations
> of text out there would a) know what it was, b) know how it should display and
> render, c) know how it should be input, stored, and transmitted and d) know 
> how it 
> should be interpreted universally.

I already gave several possible implementation suggestions. I’ll repeat one of 
them again merely to illustrate that it is possible.

Characters are 64 bit. 32 bits are stripped off as the “character set provider 
ID”. That is sent to one of many canonical servers akin to DNS servers to find 
the URL owner of those characters. At that location you’d find a number of 
representations of the character whether TrueType, vector graphics, bitmaps or 
whatever. The rendering engine would download the representation and display it 
to the user. All without the user having to know anything about character sets, 
custom fonts or whatever.

So you come across character 12340000000017. The OS asks charset server who 
owns charset 1234. They reply “facebook.com/charsets”. The OS asks 
facebook.com/charsets for facebook.com/charsets/17/truetype/pointsize12 
representation.

All this happens invisible to the user. Of course if it is already cached on 
their machine, then it wouldn’t happen.



Reply via email to