Dear all, I have some problem with snippets for documents containing accentuated characters.
I have some '?' chars instead of accents for the snippets display. I propose in attachment a patch that seems solve it. The problem comes from the fact that the indexes (computed by get_tokens) should be computed on a unicode string instead of a standard string. For example: 'été'[0:3] is not the same as u'été'[0:3]
0001-Bad-unicode-chars-for-snippets-display.patch
Description: Binary data
An other problem is searching in full-text with an accentuated request do not retrieve document with accentuated results. For example, if I search for "Méthode" in French, I will retrieve all english document with "Method" but not the French doc containing "Méthode fonctionnelle". Probably some pre-processing is required to remove accent before the SOLR indexing. Moreover, we needs to modify also snippets to keep accent for keywords during the grep process. Regards, ---------------------------------------------------------------------- Johnny Mariéthoz RERO, Av. de la Gare 45, CH - 1920 MARTIGNY Téléphone: +41(0)27 721 8579 Fax : +41(0)27 721 8586 Web : http://www.rero.ch ReroDoc : http://doc.rero.ch, [email protected] ----------------------------------------------------------------------
