> WTF-8 could potentially be as compact or more compact than UTF-8 (for 
> Greek, Arabic ...), since much of the Latin-1 and Latin Extended A blocks 
> aren't needed in WCode. If you moved the other characters down to
> fill that space, you might win what you lost to C1 compatibilty. 

   A while ago, I tried to perform a similar exercise: work out which 
characters in Unicode are "atomic", and which are compositions of them. Since 
it was more of an engineering "jeu d'esprit" than something that might see the 
light of day in any actual product, I was utterly ruthless: I even decomposed 
'i' into 'dotless i' + 'combining dot above'. (That's not the whole story, 
either: 'combining dot above' is not primitive, as it consists of a 'dot' and a 
notion of "combination".)

   The result is at <http://www.doves.demon.co.uk/atomic.html>. It has been 
mentioned on this list before, but it has been extended and ramified since 
then. It doesn't take Unicode 3 into account.

   You may find it interesting in the context of WCode as it has some of the 
same goals. Your acronym (WTF) is much better though. :-)

   It would be very entertaining to do the same job with the ideographs (down 
to the radical level) and count the number of atoms. I suspect the resulting 
"character set" would contain less than 2000 atoms altogether.

   Please do feel free to share any thoughts on the "Atomic Theory" with me!

        /|
 o o o (_|/
        /|
       (_/

Reply via email to