UTF-8 can consume up to 6 bytes per character. UCS2 is strictly 2 bytes. However most people prefer the 'backwards compatible' utf-8 where ASCII range characters still only consume 1 byte, so it should NOT overflow using ASCII, but it might using Asian characters. BTW, on average it takes 3 bytes per character for Asian characters, so a rule of thumb is to increase your string lengths by 3 when doing i18n.
Any db will have this 'problem'.. --Kurt David E Jones wrote: > > With MySQL 4.1X using UTF-8 really messed up the column sizes because > it stores each UTF-8 character as 3 bytes (why not 2 I don't know...). > In other words, if you had a varchar of length 60 and put in a 21 > character UTF-8 string, it will overflow... > > I don't know if this is still an issue with the 5 series of MySQL. > > -David > > > On Oct 5, 2006, at 7:08 PM, Si Chen wrote: > >> Hi - >> >> Just curious...why does entityengine.xml set default character set >> and collation for MySQL to latin1 instead of utf8? >> >> >> Si >> [EMAIL PROTECTED] >> >> >> >
