[rt-users] utf8 and accents.
I need some suggestions, I have come to the conclusion that all utf8 collations don't do french properly, not like latin1 anyway. All accents are seen as the same, while binary distinct they cannot be unique indexed and sorting will recognize them as the same as well as queries using any variant character. So I'm in a bit of a bind, if I were to use RT with a case sensitive collation like utf8_bin would the application behave as expected? I know search would be much more strict and possibly confusing to the end user. My other option would be to continue to use latin1, is there any way to accomplish this using the latest code base? It's probably not configurable and I don't want to have to manage diffs for the possible changes, unless it is fairly minimal to do.. The issue in question - http://bugs.mysql.com/bug.php?id=34130 They said it's on 'todo', MSSQL handles this with ci_ai, ci_as, cs_ai and cs_as collations where the accents are either sensitive or not. Hopefully they do come around to it.. Character difference for mysql .. http://www.collation-charts.org/mysql60/ Curtis ___ http://lists.bestpractical.com/cgi-bin/mailman/listinfo/rt-users Community help: http://wiki.bestpractical.com Commercial support: [EMAIL PROTECTED] Discover RT's hidden secrets with RT Essentials from O'Reilly Media. Buy a copy at http://rtbook.bestpractical.com
Re: [rt-users] utf8 and accents.
On Sat, Aug 9, 2008 at 12:20 AM, Curtis Bruneau [EMAIL PROTECTED] wrote: I need some suggestions, I have come to the conclusion that all utf8 collations don't do french properly, not like latin1 anyway. All accents are seen as the same, while binary distinct they cannot be unique indexed and sorting will recognize them as the same as well as queries using any variant character. So I'm in a bit of a bind, if I were to use RT with a case sensitive collation like utf8_bin would the application behave as expected? I know search would be much more strict and possibly confusing to the end user. utf8_bin is good choice. You're free to use binary collation. May be utf8_general_ci collation will be better for you. Any collation is ok as long as you know how to deal with them in mysql. My other option would be to continue to use latin1, is there any way to accomplish this using the latest code base? It's probably not configurable and I don't want to have to manage diffs for the possible changes, unless it is fairly minimal to do.. No, we wouldn't return to that as it's totally wrong and have concequences as it's actually violation of setting purpose. RT was storing UTF8 encoded data in a latin1 column, so collations worked absolutly incorrect for everything even latin1 and were close to binary. At this point I can suggest you move either binary collation or create a new one and send it to mysql team for inclusion. The issue in question - http://bugs.mysql.com/bug.php?id=34130 They said it's on 'todo', MSSQL handles this with ci_ai, ci_as, cs_ai and cs_as collations where the accents are either sensitive or not. Hopefully they do come around to it.. Character difference for mysql .. http://www.collation-charts.org/mysql60/ Curtis ___ http://lists.bestpractical.com/cgi-bin/mailman/listinfo/rt-users Community help: http://wiki.bestpractical.com Commercial support: [EMAIL PROTECTED] Discover RT's hidden secrets with RT Essentials from O'Reilly Media. Buy a copy at http://rtbook.bestpractical.com -- Best regards, Ruslan. ___ http://lists.bestpractical.com/cgi-bin/mailman/listinfo/rt-users Community help: http://wiki.bestpractical.com Commercial support: [EMAIL PROTECTED] Discover RT's hidden secrets with RT Essentials from O'Reilly Media. Buy a copy at http://rtbook.bestpractical.com
Re: [rt-users] utf8 and accents.
Ruslan Zakirov wrote: On Sat, Aug 9, 2008 at 12:20 AM, Curtis Bruneau [EMAIL PROTECTED] wrote: I need some suggestions, I have come to the conclusion that all utf8 collations don't do french properly, not like latin1 anyway. All accents are seen as the same, while binary distinct they cannot be unique indexed and sorting will recognize them as the same as well as queries using any variant character. So I'm in a bit of a bind, if I were to use RT with a case sensitive collation like utf8_bin would the application behave as expected? I know search would be much more strict and possibly confusing to the end user. utf8_bin is good choice. You're free to use binary collation. May be utf8_general_ci collation will be better for you. Any collation is ok as long as you know how to deal with them in mysql. Ok just wondering, I'll give it a try.. I was more curious if any string type clauses would still work internally since binary collations are everything/case sensitive . I'm guessing that's all fine because I think postgres stores it's stuff as binary_cs and relies on the OS do to collations (something like that, other postgres db's around here seem to be case sensitive). My other option would be to continue to use latin1, is there any way to accomplish this using the latest code base? It's probably not configurable and I don't want to have to manage diffs for the possible changes, unless it is fairly minimal to do.. No, we wouldn't return to that as it's totally wrong and have concequences as it's actually violation of setting purpose. RT was storing UTF8 encoded data in a latin1 column, so collations worked absolutly incorrect for everything even latin1 and were close to binary. At this point I can suggest you move either binary collation or create a new one and send it to mysql team for inclusion. Understood, I wasn't liking that idea either. Oddly enough latin1_swedish_ci (the latin1 default) isn't suppose to be accent sensitive, latin1_general_ci is but my old database (mysql 4.1) seems to be indexing it and seeing them seperate. The collation isn't specified so i'm assuming swedish but it's behaving like general, perhaps the old version respected the differences. I'm basically trying to get it the same as before (perhaps if swedish was enforced before I wouldn't be in this position), regardless this isn't really an issue with RT. The issue in question - http://bugs.mysql.com/bug.php?id=34130 They said it's on 'todo', MSSQL handles this with ci_ai, ci_as, cs_ai and cs_as collations where the accents are either sensitive or not. Hopefully they do come around to it.. Character difference for mysql .. http://www.collation-charts.org/mysql60/ Curtis Thanks again for your time, i'm really excited to launch 3.8.x, compared to 3.4.x our users are loving it, especially the reporting and all that. Curtis. ___ http://lists.bestpractical.com/cgi-bin/mailman/listinfo/rt-users Community help: http://wiki.bestpractical.com Commercial support: [EMAIL PROTECTED] Discover RT's hidden secrets with RT Essentials from O'Reilly Media. Buy a copy at http://rtbook.bestpractical.com