Hello Ferran, >> we have changed the default word tokenizer to properly account for >> czech accents and now need to rebuild all the indexes. All went well >> except that the global virtual index refuses to reindex. > > I'm unsure that the way to tackle this is in the word tokenizer; > shouldn't it be done in the strip_accents funcion? Some years ago I > proposed to change its implementation:
Yes, we have actually changed the strip_accents function but the result is that the tokenization has changed and the virtual global index refuses to fully recognize this. > https://github.com/inveniosoftware/invenio/issues/425 Our change to strip_accents was a bit more opportunistic. We have just added some more accented letters to the repertoire of regexps used there and also added unicode normalization as the initial step there. > I did not popose a patch because I don't know how to implement the > tests. Me either :( Regards, Petr

