Unicode is messy, but it's not that messy.

The utf-8 encoding places a limit on how many characters can be
encoded, and if I understand properly, that limit is slightly over a
million, and less than a quarter of those theoretical characters
currently have been assigned. Of course... unicode is a standard and
the nice thing about standards is that we have so many to pick from...
but utf-32 just convenient.

Still, even 250k characters is a lot to deal with. Just representing
which sets each of them belongs in is a measurable load. Representing
font information and all of the numerous special rules is going to
occupy a certain amount of space. There's space/time tradeoffs but if
we require that the language make those tradeoffs that's going to be
good for some cases and bad for other cases.

Basically, everyone has to pay for the storage (and other overhead) of
every feature built into the language, every time the language gets
used. That works for some contexts, but I think it plays against J's
strengths.

Also... a key issue here is that, if we cannot model a feature like
this outside the language, we are not ready to implement it within the
language.

Thanks,

-- 
Raul



On Thu, Feb 27, 2014 at 2:00 PM, Björn Helgason <gos...@gmail.com> wrote:
> Unicode was supposed to be the solution to the problems with the APL chars
> as well all the code pages with national characters.
>
> As should be obvious the solution is far from anywhere close.
>
> UTF-8 UTF-16 UTF-32 UTF-64 UTF-???
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to