Re: [HACKERS] Extra Vietnamese unaccent rules

Dang Minh Huong Mon, 29 May 2017 08:23:41 -0700

> On May 29, 29 Heisei, at 10:47, Thomas Munro <[email protected]> 
> wrote:
> 
> On Sun, May 28, 2017 at 7:55 PM, Dang Minh Huong <[email protected]> wrote:
>> Thanks for reporting and lecture about unicode.
>> I attached a patch as the instruction from Thomas. Could you confirm it.
> 
> -           is_plain_letter(table[codepoint.combining_ids[0]]) and \
> +           (is_plain_letter(table[codepoint.combining_ids[0]]) or\
> +            len(table[codepoint.combining_ids[0]].combining_ids) > 1) and \
> 
> Shouldn't you use "or is_letter_with_marks()", instead of "or len(...)
>> 1"?  Your test might catch something that isn't based on a 'letter'
> (according to is_plain_letter).  Otherwise this looks pretty good to
> me.  Please add it to the next commitfest.


Thanks for confirm, sir.
I will add it to the next CF soon.

> I expect that some users in Vietnam will consider this to be a bugfix,
> which raises the question of whether to backpatch it.  Perhaps we
> could consider fixing it for 10.  Then users of older versions could
> grab the rules file from 10 to use with 9.whatever if they want to do
> that and reindex their data as appropriate.

I am also inclined to the fixing it for 10, because it will not affect to 
current users.
But do you want to back-patch to all supported versions Kha Nguyen?
# I would also want to note that, not only Vietnamese characters were missed to 
add from the rule list.


---
Thanks and best regards,
Dang Minh Huong

Re: [HACKERS] Extra Vietnamese unaccent rules

Reply via email to