Emmanuel Dreyfus <[email protected]> wrote: > I just upgraded Apache to 2.4 and RT to latest 3.8, and I get a charset > problem: anything that enter RT through rt-mailgate is fine, but any non > ASCII character sent through the web interface gets corrupted: I get a ? > in a quare instead, which is usually what happens when ISO-8859-1 > character was mistaken as UTF-8. > > Older messages from before the upgrade display correctly, hence this is > really a problem at message POST time.
I fixed it. Replying to myself with the whole story for someone else's future reference. The problem was database encoding. RT can use PostgreSQL with encoding "UTF-8" or the default "SQL_ASCII". That later encoding means PostgreSQL does not care about encoding and just gives back the bytes it was given without any check. The former enforces UTF-8 usage and is able to automatically transcode if the client claims to use another encoding. My RT installation had been configured with the PostgreSQL database using "UTF-8" encoding for a while. At some time I upgraded PostgreSQL and I reloaded the data from a dump after reinitializing the database. But since I did not check for it, it got "SQL_ASCII", a setup where the application must take care of data encoding. RT stores data as UTF-8 but It seems there are some conversions missing in the code, especially on ticket creation through the web. I did not find where it happens, but this action was introducing ISO-8859-1 characters in the database. After a few weeks, I had a database randomly mixing ISO-8859-1 and UTF-8 data. Fixing the situation required to dump, drop and create again the database with "UTF-8" encoding and reloading from the dump. But doing so required to clean up the dump from any ISO-8859-1 character, otherwise PostgreSQL could not load it. Using iconv(1) could not help since there was also some UTF-8 characaters in the database. I had to write exernal C functions for PostgreSQL to perfom query such as update attachments set content=qpfix(content), contentencoding="qupoted-printable" where not is_utf8(content); is_utf8() is an external function that finds character sequences invalid for UTF-8 qpfix() is an external function that translates ISO-8859-1 in quoted-printable UTF-8 That kind of fixes had to be done in a various columns of table attachments, users, and transactions. I can share the C code if someone is interested. After the proper fix, the database dump could be reimported in the UTF-8 encoded database, and the charset trouble on ticket creation from the web disapeared. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz [email protected]
