> On May 29, 29 Heisei, at 10:47, Thomas Munro <thomas.mu...@enterprisedb.com> > wrote: > > On Sun, May 28, 2017 at 7:55 PM, Dang Minh Huong <kakalo...@gmail.com> wrote: >> Thanks for reporting and lecture about unicode. >> I attached a patch as the instruction from Thomas. Could you confirm it. > > - is_plain_letter(table[codepoint.combining_ids]) and \ > + (is_plain_letter(table[codepoint.combining_ids]) or\ > + len(table[codepoint.combining_ids].combining_ids) > 1) and \ > > Shouldn't you use "or is_letter_with_marks()", instead of "or len(...) >> 1"? Your test might catch something that isn't based on a 'letter' > (according to is_plain_letter). Otherwise this looks pretty good to > me. Please add it to the next commitfest.
Thanks for confirm, sir. I will add it to the next CF soon. > I expect that some users in Vietnam will consider this to be a bugfix, > which raises the question of whether to backpatch it. Perhaps we > could consider fixing it for 10. Then users of older versions could > grab the rules file from 10 to use with 9.whatever if they want to do > that and reindex their data as appropriate. I am also inclined to the fixing it for 10, because it will not affect to current users. But do you want to back-patch to all supported versions Kha Nguyen? # I would also want to note that, not only Vietnamese characters were missed to add from the rule list. --- Thanks and best regards, Dang Minh Huong