Dnia 06-11-2007, Wt o godzinie 14:35 +0000, Dave Cridland pisze: > > Let's take first 256 allowable UTF-8 characters [...]
> Can't do that, because many of those characters are going to be > illegal even in CDATA sections. First _allowable_ 256 UTF-8 characters are for sure legal in CDATA section. > But bear in mind that even then, to encode a single octet will yield > between 1 and 3 characters. I would only use those UTF-8 characters that maps to maximum 2 bytes. Leaving the 3byte and more... And a better mapping: Bytes that are valid UTF-8 characters are mapped 1 to 1. Only the invalid ones are mapped to 2byte characters. This way if the "binary" data is ASCII text, it stays human readable. This is a simple 256 rows translation table, that could be defined verbatim. -- /\_./o__ Tomasz Sterna (/^/(_^^' Xiaoka.com ._.(_.)_ XMPP: [EMAIL PROTECTED]
