On Thu, Jun 23, 2022 at 02:10:42PM +0200, Przemysław Sztoch wrote: > The only division that is probably possible is the one attached.
Well, the addition of cyrillic does not make necessary the removal of SOUND RECORDING COPYRIGHT or the DEGREEs, that implies the use of a dictionnary when manipulating the set of codepoints, but that's me being too picky. Just to say that I am fine with what you are proposing here. By the way, could you add a couple of regressions tests for each patch with a sample of the characters added? U+210C is a particularly sensitive case, as we should really make sure that it maps to what we want even if Latin-ASCII.xml tells a different story. This requires the addition of a couple of queries in unaccent.sql with the expected output updated in unaccent.out. -- Michael
signature.asc
Description: PGP signature