It's DUDE-8!  It's quick!  It's easy!  It does it all!

The DUDE-8 algorithm compresses 21-bit Unicode Scalar Values into bytes.
It is based on the DUDE(-6) algorithm currently being proposed for
i18n of DNS names.

1.  Let prev = 0.
2.  For each Unicode Scalar value usv, let prev = prev xor USV.
3.  Let the value of prev be expressed in 21 bits as xxxxxxxyyyyyyyzzzzzzz.
4.  Encode prev in 3 bytes as 0xxxxxxx 0yyyyyyy 1zzzzzzz.
5.  Emit all non-zero bytes.
6.  Repeat until done.

The optional signature of DUDE-8 is 03 FD FF, which is how U+FEFF appears at the
beginning of a DUDE-8 compression stream.

DUDE-8 is simpler and simpler than SCSU, but doesn't allow recovery from
garbles or even partial random or backwards access.

-- 
John Cowan                                   [EMAIL PROTECTED]
One art/there is/no less/no more/All things/to do/with sparks/galore
        --Douglas Hofstadter

Reply via email to