Re: why is mysql default character set latin1?

David E Jones Mon, 09 Oct 2006 10:35:06 -0700


On Oct 9, 2006, at 4:52 PM, Kurt T Stam wrote:

UTF-8 can consume up to 6 bytes per character. UCS2 is strictly 2bytes.However most people prefer the 'backwards compatible' utf-8 whereASCII
range characters still only consume 1 byte, so it should NOT overflow
using ASCII, but it might using Asian characters. BTW, on average it
takes 3 bytes per character for Asian characters, so a rule ofthumb is
to increase your string lengths by 3 when doing i18n.

Any db will have this 'problem'..

To some extent this is true, but it seems that many other databases"hide" this internally by treating field sizes as the total number ofcharacters instead of the total number of bytes. In other words, ifyou are using a multi-byte character set like UTF-8 and it wants toreserve 3 bytes per character and you say your column should be 255characters, then internally it will make that 765 bytes to coverthose 255 characters you wanted in your column size.

In the 4 series MySQL didn't do this, hence the latin character setdefault. I don't know if this has changed in the 5 series, but itsure would be nice!


-David

Re: why is mysql default character set latin1?

Reply via email to