Hi Greg: On Fri, 26 Mar 2010, Gregory Favre wrote: > I recently discovered that my database was using latin1 default > encoding. I switched to utf-8 (which means also sql CONVERTs and > friends). Everything seemed to work fine (except some strangely > encoded titles), but then I discovered some terrible indexing > problems.
>From your description it seems the DB tables are well set up UTF-8-wise. Can you check for which records süsstrunk vs susstrunk appear in the index? If you isolate record ID examples for both forms, then check those records' MARCXML values and the bibxxx table values, to see if there is some difference between them in stored values? Chances are there will be. Since you mention there were some title encoding troubles, maybe the tables were not fully properly converted from Latin-1 to UTF-8? The conversion usually goes like: $ mysqldump -u root -p cdsinvenio --default-character-set=latin1 collectionname > z.sql $ vi z.sql # change "SET NAMES latin1" to "SET NAMES utf8" and/or "DEFAULT CHARSET=latin1" to "DEFAULT CHARSET=utf8" $ cat z.sql | mysql --default-character-set=utf8 -u root -p cdsinvenio (very schematically speaking) Best regards -- Tibor Simko ** CERN Document Server ** <http://cds.cern.ch/>
