Hi Tibor,
Thanks for your reply! 
> 
> Can you check for which records süsstrunk vs susstrunk appear in the
> index?  If you isolate record ID examples for both forms, then check
> those records' MARCXML values and the bibxxx table values, to see if
> there is some difference between them in stored values?  Chances are
> there will be.
> 

It seems that both bibxxx tables and idxWORDXXX tables have issues (duplicates, 
? chars which appear to be real question marks and not encoding issues). 

> Since you mention there were some title encoding troubles, maybe the
> tables were not fully properly converted from Latin-1 to UTF-8?  The
> conversion usually goes like:

The other tables (bibdoc, collectionname and friends) seem to be encoded 
correctly though. 
Is there a way to reindex completely (bibxxx and idxWORD tables) from the 
bibfmt tables? Of course, I'd rather not truncate bibrec tables and reinsert 
the 60'000 xml files ;-)

Best regards,
Greg

> 
> $ mysqldump -u root -p cdsinvenio --default-character-set=latin1 
> collectionname > z.sql
> $ vi z.sql # change "SET NAMES latin1" to "SET NAMES utf8" and/or "DEFAULT 
> CHARSET=latin1" to "DEFAULT CHARSET=utf8"
> $ cat z.sql | mysql --default-character-set=utf8 -u root -p cdsinvenio
> 
> (very schematically speaking)
> 
> Best regards
> -- 
> Tibor Simko ** CERN Document Server ** <http://cds.cern.ch/>

____________________________________________________________________

Gregory Favre
Coordinateur Infoscience
École Polytechnique Fédérale de Lausanne
KIS - DIT
Case Postale 121
CH-1015 Lausanne
+41 21 693 22 88
+ 41 79 599 09 06
[email protected]
http://plan.epfl.ch/?sciper=128933
____________________________________________________________________




Reply via email to