Hi Greg:

On Fri, 26 Mar 2010, Gregory Favre wrote:
> I recently discovered that my database was using latin1 default
> encoding. I switched to utf-8 (which means also sql CONVERTs and
> friends). Everything seemed to work fine (except some strangely
> encoded titles), but then I discovered some terrible indexing
> problems. 

>From your description it seems the DB tables are well set up UTF-8-wise.

Can you check for which records süsstrunk vs susstrunk appear in the
index?  If you isolate record ID examples for both forms, then check
those records' MARCXML values and the bibxxx table values, to see if
there is some difference between them in stored values?  Chances are
there will be.

Since you mention there were some title encoding troubles, maybe the
tables were not fully properly converted from Latin-1 to UTF-8?  The
conversion usually goes like:

 $ mysqldump -u root -p cdsinvenio --default-character-set=latin1 
collectionname > z.sql
 $ vi z.sql # change "SET NAMES latin1" to "SET NAMES utf8" and/or "DEFAULT 
CHARSET=latin1" to "DEFAULT CHARSET=utf8"
 $ cat z.sql | mysql --default-character-set=utf8 -u root -p cdsinvenio

(very schematically speaking)

Best regards
-- 
Tibor Simko ** CERN Document Server ** <http://cds.cern.ch/>

Reply via email to