[rt-users] utf8 and accents.

2008-08-08 Thread Curtis Bruneau
I need some suggestions, I have come to the conclusion that all utf8 
collations don't do french properly, not like latin1 anyway. All accents 
are seen as the same, while binary distinct they cannot be unique 
indexed and sorting will recognize them as the same as well as queries 
using any variant character.

So I'm in a bit of a bind, if I were to use RT with a case sensitive 
collation like utf8_bin would the application behave as expected? I know 
search would be much more strict and possibly confusing to the end user.

My other option would be to continue to use latin1, is there any way to 
accomplish this using the latest code base? It's probably not 
configurable and I don't want to have to manage diffs for the possible 
changes, unless it is fairly minimal to do..

The issue in question - http://bugs.mysql.com/bug.php?id=34130

They said it's on 'todo', MSSQL handles this with ci_ai, ci_as, cs_ai 
and cs_as collations where the accents are either sensitive or not. 
Hopefully they do come around to it..

Character difference for mysql .. http://www.collation-charts.org/mysql60/


Curtis


___
http://lists.bestpractical.com/cgi-bin/mailman/listinfo/rt-users

Community help: http://wiki.bestpractical.com
Commercial support: [EMAIL PROTECTED]


Discover RT's hidden secrets with RT Essentials from O'Reilly Media. 
Buy a copy at http://rtbook.bestpractical.com


Re: [rt-users] utf8 and accents.

2008-08-08 Thread Ruslan Zakirov
On Sat, Aug 9, 2008 at 12:20 AM, Curtis Bruneau [EMAIL PROTECTED] wrote:
 I need some suggestions, I have come to the conclusion that all utf8
 collations don't do french properly, not like latin1 anyway. All accents
 are seen as the same, while binary distinct they cannot be unique
 indexed and sorting will recognize them as the same as well as queries
 using any variant character.

 So I'm in a bit of a bind, if I were to use RT with a case sensitive
 collation like utf8_bin would the application behave as expected? I know
 search would be much more strict and possibly confusing to the end user.

utf8_bin is good choice. You're free to use binary collation. May be
utf8_general_ci collation will be better for you. Any collation is ok
as long as you know how to deal with them in mysql.


 My other option would be to continue to use latin1, is there any way to
 accomplish this using the latest code base? It's probably not
 configurable and I don't want to have to manage diffs for the possible
 changes, unless it is fairly minimal to do..

No, we wouldn't return to that as it's totally wrong and have
concequences as it's actually violation of setting purpose. RT was
storing UTF8 encoded data in a latin1 column, so collations worked
absolutly incorrect for everything even latin1 and were close to
binary.

At this point I can suggest you move either binary collation or create
a new one and send it to mysql team for inclusion.


 The issue in question - http://bugs.mysql.com/bug.php?id=34130

 They said it's on 'todo', MSSQL handles this with ci_ai, ci_as, cs_ai
 and cs_as collations where the accents are either sensitive or not.
 Hopefully they do come around to it..

 Character difference for mysql .. http://www.collation-charts.org/mysql60/


 Curtis


 ___
 http://lists.bestpractical.com/cgi-bin/mailman/listinfo/rt-users

 Community help: http://wiki.bestpractical.com
 Commercial support: [EMAIL PROTECTED]


 Discover RT's hidden secrets with RT Essentials from O'Reilly Media.
 Buy a copy at http://rtbook.bestpractical.com




-- 
Best regards, Ruslan.
___
http://lists.bestpractical.com/cgi-bin/mailman/listinfo/rt-users

Community help: http://wiki.bestpractical.com
Commercial support: [EMAIL PROTECTED]


Discover RT's hidden secrets with RT Essentials from O'Reilly Media. 
Buy a copy at http://rtbook.bestpractical.com


Re: [rt-users] utf8 and accents.

2008-08-08 Thread Curtis Bruneau
Ruslan Zakirov wrote:
 On Sat, Aug 9, 2008 at 12:20 AM, Curtis Bruneau [EMAIL PROTECTED] wrote:
   
 I need some suggestions, I have come to the conclusion that all utf8
 collations don't do french properly, not like latin1 anyway. All accents
 are seen as the same, while binary distinct they cannot be unique
 indexed and sorting will recognize them as the same as well as queries
 using any variant character.

 So I'm in a bit of a bind, if I were to use RT with a case sensitive
 collation like utf8_bin would the application behave as expected? I know
 search would be much more strict and possibly confusing to the end user.
 

 utf8_bin is good choice. You're free to use binary collation. May be
 utf8_general_ci collation will be better for you. Any collation is ok
 as long as you know how to deal with them in mysql.


   
Ok just wondering, I'll give it a try.. I was more curious if any string 
type clauses would still work internally since binary collations are 
everything/case sensitive
. I'm guessing that's all fine because I think postgres stores it's 
stuff as binary_cs and relies on the OS do to collations (something like 
that, other postgres db's around here seem to be case sensitive).
 My other option would be to continue to use latin1, is there any way to
 accomplish this using the latest code base? It's probably not
 configurable and I don't want to have to manage diffs for the possible
 changes, unless it is fairly minimal to do..
 

 No, we wouldn't return to that as it's totally wrong and have
 concequences as it's actually violation of setting purpose. RT was
 storing UTF8 encoded data in a latin1 column, so collations worked
 absolutly incorrect for everything even latin1 and were close to
 binary.

 At this point I can suggest you move either binary collation or create
 a new one and send it to mysql team for inclusion.

   
Understood, I wasn't liking that idea either. Oddly enough 
latin1_swedish_ci (the latin1 default) isn't suppose to be accent 
sensitive,  latin1_general_ci is but my old database (mysql 4.1) seems 
to be indexing it and seeing them seperate. The collation isn't 
specified so i'm assuming swedish but it's behaving like general, 
perhaps the old version respected the differences. I'm basically trying 
to get it the same as before (perhaps if swedish was enforced before I 
wouldn't be in this position), regardless this isn't really an issue 
with RT.
 The issue in question - http://bugs.mysql.com/bug.php?id=34130

 They said it's on 'todo', MSSQL handles this with ci_ai, ci_as, cs_ai
 and cs_as collations where the accents are either sensitive or not.
 Hopefully they do come around to it..

 Character difference for mysql .. http://www.collation-charts.org/mysql60/


 Curtis
 
Thanks again for your time, i'm really excited to launch 3.8.x, compared 
to 3.4.x our users are loving it, especially the reporting and all that.
Curtis.
___
http://lists.bestpractical.com/cgi-bin/mailman/listinfo/rt-users

Community help: http://wiki.bestpractical.com
Commercial support: [EMAIL PROTECTED]


Discover RT's hidden secrets with RT Essentials from O'Reilly Media. 
Buy a copy at http://rtbook.bestpractical.com