On 2012/04/28 7:29, Cristian Secară wrote:
În data de Fri, 27 Apr 2012 12:26:25 -0700, Mark Davis ☕ a scris:
Actually, if the goal is to get as many characters in as possible,
Punycode might be the best solution. That is the encoding used for
internationalized domains. In that form, it uses a smaller number of
bytes per character, but a parameterization allows use of all byte
values.
I suspect the punycode goal is to take a wide character set into a
restricted character set, without caring much on resulting string
length; if the original string happens to be in other character set
than the target restricted character set, then the string length
increases too much to be of interest in the SMS discussion.
Not exactly. Compression was very much a goal when designing punycode.
It won against a number of other algorithms as the choice for IDNs and
is clearly very good for that purpose.
Just do a test: write something in a non-Latin alphabetic script into
this page here http://demo.icu-project.org/icu-bin/idnbrowser
Well, as a silly example, what about
ααααααααααααααααααααααααααααααααααααααααααααααααααααααααα?
(that's 57 α characters). The result is
xn--mxaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,
which is 63 characters long.
Regards, Martin.