On 24/09/2006, at 10:20 PM, Joshua Sierles wrote:

>> - Make sure your database character set is utf8
>> - Make sure all your tables have a character set of utf8
>> - Make sure your database.yml has 'encoding: utf8' set for each  
>> database
>
> None of these steps are required officially unless you use utf-8
> specific features of the database (collation). The last setting seems
> to set the connection encoding, which shouldn't be required unless
> there is non-utf8 data stored in the database.

Not true! Collation and character set are separate things.

There are a couple of obvious reasons you want your database  
character set to be UTF8 if you're storing UTF8 strings:

1. When you access the database through the mysql (or pgsql, or  
other) command line, or through tools such as CocoaMySQL, you want  
strings to display properly.

2. MySQL never treats strings as binary; they always have a character  
set, which is latin1 (CP1252) by default. Putting UTF8 data into  
fields marked as latin1 seems like asking for trouble. (There are  
some byte values that are invalid in CP1252, so technically strings  
containing those bytes are illegal. It's only through MySQL's  
laziness in not checking the strings when the connection and table  
character sets match up that you can get away with this at all.)

There are even worse potential pitfalls here too. On one of our  
projects, we did everything except set the the connection encoding.  
What happened was that a UTF8 string in Rails would be regarded as  
CP1252 by MySQL, but MySQL knew that the tables needed UTF8, so it  
did a CP1252 to UTF8 conversion on the (already UTF8) string before  
writing it. As you can imagine, we ended up with all sorts of crap in  
the database, and the occasional string got completely munged as  
invalid CP1252 bytes were replaced with question marks.

These three things should at least be reduced to a single setting to  
avoid mistakes. I can't imagine a situation in which you would want  
to do one of them without the others.

>> - Put $KCODE='u' in your environment.rb
>
> This is only required if you use unicode strings in your Ruby code.

If your app handles UTF8, then you're going to want to write tests  
involving UTF8 strings, so you're going to need this turned on. You  
do write UTF8 tests for your apps, right? :)

> - Add an after_filter to application.rb to set the Content-Type
> header correctly
>
> Rails now defaults to utf-8 Content-Type.

Good to know. I'll take this as an endorsement of the idea the UTF8  
should be the default for Rails apps. :)

Cheers,

Pete Yandell

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Core" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-core
-~----------~----~----~----~------~----~------~--~---

Reply via email to