I have numbers for text size and conversion performance of BOCU-1 and SCSU relative to 
UTF-8.

Quick summary:

For Latin text, UTF-8 is best.
For CJK, BOCU-1 and SCSU provide smaller size, with some speed trade-off.
For other scripts, BOCU-1 and SCSU are much better than UTF-8 in both speed and size.

Note that BOCU-1 encoded text (since it preserves control characters and spaces) could 
be directly used in emails, for CVS, etc.

Please see http://oss.software.ibm.com/icu/dropbox/bocuperf.html

Best regards,
markus


Reply via email to