[Rails-core] Re: ActiveSupport::Multibyte for better Unicode support

Mislav Marohnić Sat, 23 Sep 2006 07:10:35 -0700

Peter,

The problems is correctly supporting multibyte strings. Unicode, the most complete character set, has several encodings (UTF-8 being the most popular one), each of them having some (or all) characters expressed with two or more bytes (unlike ASCII, for instance). In UTF-8, "abc" is a three-character string encoded in 3 bytes, but "čžš" (3 characters from Croatian alphabet) are encoded in 6 bytes (2 bytes each).

Multibyte-unaware programming languages (like Ruby and PHP < 6) assume 1 character = 1 byte. In Ruby, try string.reverse or string.length on strings containing special characters to see some unexpected results. Reverse will corrupt the string while length will report in bytes, not in characters. These are trivial examples, while the problem goes much deeper.

Rails needs this.

--
Mislav

On 9/23/06, Peter Michaux <[EMAIL PROTECTED]> wrote:

I'm interested in a general overview on what problem it fixes and why
it is needed. I don't know much about the whole unicode problem with
Ruby people keep bringing up and then other say it isn't really a
problem.

Peter

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/rubyonrails-core
-~----------~----~----~----~------~----~------~--~---

[Rails-core] Re: ActiveSupport::Multibyte for better Unicode support

Reply via email to