Note, for example, that Google manages to sort out issues like these. It
sees past diacritics and even case ending.
I guess they must normalize all inputs to some standard form and then search / eigenvectorize on those. There are quite a few diacritics and a fair few glyphs they could be applied to. I don't think it likely they could map all possible combinations to a private range.
