On Oct 9, 2006, at 4:52 PM, Kurt T Stam wrote:

UTF-8 can consume up to 6 bytes per character. UCS2 is strictly 2 bytes. However most people prefer the 'backwards compatible' utf-8 where ASCII
range characters still only consume 1 byte, so it should NOT overflow
using ASCII, but it might using Asian characters. BTW, on average it
takes 3 bytes per character for Asian characters, so a rule of thumb is
to increase your string lengths by 3 when doing i18n.

Any db will have this 'problem'..

To some extent this is true, but it seems that many other databases "hide" this internally by treating field sizes as the total number of characters instead of the total number of bytes. In other words, if you are using a multi-byte character set like UTF-8 and it wants to reserve 3 bytes per character and you say your column should be 255 characters, then internally it will make that 765 bytes to cover those 255 characters you wanted in your column size.

In the 4 series MySQL didn't do this, hence the latin character set default. I don't know if this has changed in the 5 series, but it sure would be nice!

-David

Reply via email to