Mark Davis 🍍 <mark at macchiato dot com> wrote: > Actually, if the goal is to get as many characters in as possible, > Punycode might be the best solution. That is the encoding used for > internationalized domains. In that form, it uses a smaller number of > bytes per character, but a parameterization allows use of all byte > values.
That might work well if the goal is to find a compact encoding to 7-bit code units, then express 8 such code units in 7 bytes. It would certainly be more economical than UTF-7-over-7, which is fine for ASCII and awful for anything else. I don't usually think of Punycode as an ideal general-purpose compression encoding, especially with lines of arbitrary length or consisting primarily of non-ASCII content (Cristian's example), but it's certainly worth experimenting. One advantage might be that encoders and decoders for Punycode already exist, probably in greater numbers than for SCSU. -- Doug Ewell | Thornton, Colorado, USA http://www.ewellic.org | @DougEwell

