UTF-12

Philippe Verdy Mon, 21 Jun 2010 12:25:31 -0700

Yes, this is smart, especially for its exact mapping to Base64, where
it will even be superior to UTF-8 in many more cases (there should be
a comparison table of sizes between UTF-8, UTF-16 and UTF-12 in the
Base64 transport encoding).

You should also add, somewhere in the last section of your web
document, that Base64 is not just well suited to 7-bit only
environments, but as well to many 7-bit and 8-bit environments that
require MIME compatibility for controls and spaces (notably in
emails). After all, Base64 was first designed and standardized exactly
for that purpose.

All the Base64 variants, as described in:
http://en.wikipedia.org/wiki/Base64#Variants summary table
will also be usable in the query string appended to URLs, even though
the HTML form data submitted in Base64 with the equivalent (default)
GET method (or with the specified POST method) should only use one of
the two variants :
- 'Base64' encoding standardized for RFC 3548 or RFC 4648 (with the
explicit HTML form element attributes : encoding="base64", and
method="post")
- Modified Base64 encoding for URL applications (with the explicit
HTML form element attributes : encoding="base64url", and the default
method="get")

This applies to :
- all URL query parameters, in a a query string that are enumerated
and separated by ampersands (&), and then represented as name=value
pairs or just with unnamed values (there will be no conflict with the
Base64 variants that use the equal sign for padding, given that no
Base64 padding is necessary when transporting UTF-12 encoded texts)
- as well as the other Base64 variants for filenames, or for XML
Names, or for XML NmTokens, or program identifiers.

One more question :

Your page is copyrighted and signed by you (with your email address as
the contact) ; this is absolutely not a problem (in fact it is a good
practice for all publications on the web), but there does not seem to
exist any proposed licence on your page, so the only way to get one
would be to contact you via your displayed email address.

Can this specification page be licenced by you in an open or free way
on this page, possibly dual-licenced under Creative Commons (CC-BY-SA
: author's attribution required, share-alike) or LGPL (because it
describes an algorithm, assimilable to library source code that will
then be freely modifiable and implementable) ?

-- Philippe.

On 2010-06-21 at 19:00 CEST, "Andrey V. Lukyanov" <[email protected]> wrote:
> As you might guess, UTF-12 is a system for representing Unicode
> characters with a stream of 12-bit units. It was invented recently by
> me.
>
> Full description is here:
>
> http://tapemark.narod.ru/comp/utf12en.html
>
> UTF-12 may be of little use in practice, but it is very nice from the
> theoretical point of view.

UTF-12

Reply via email to