On Tue, Jun 28, 2022 at 02:14:53PM +0900, Michael Paquier wrote: > Well, the addition of cyrillic does not make necessary the removal of > SOUND RECORDING COPYRIGHT or the DEGREEs, that implies the use of a > dictionnary when manipulating the set of codepoints, but that's me > being too picky. Just to say that I am fine with what you are > proposing here.
So, I have been looking at the change for cyrillic letters, and are you sure that the range of codepoints [U+0410,U+044f] is right when it comes to consider all those letters as plain letters? There are a couple of characters that itch me a bit with this range: - What of the letter CAPITAL SHORT I (U+0419) and SMALL SHORT I (U+0439)? Shouldn't U+0439 be translated to U+0438 and U+0419 translated to U+0418? That's what I get while looking at UnicodeData.txt, and it would mean that the range of plain letters should not include both of them. - It seems like we are missing a couple of letters after U+044F, like U+0454, U+0456 or U+0455 just to name three of them? I have extracted from 0001 and applied the parts about the regression tests for degree signs, while adding two more for SOUND RECORDING COPYRIGHT (U+2117) and Black-Letter Capital H (U+210C) translated to 'x', while it should be probably 'H'. -- Michael
signature.asc
Description: PGP signature