Re: [HACKERS] Extra Vietnamese unaccent rules

2017-08-17 Thread Dang Minh Huong
Thanks! On 2017/08/17 11:56, Tom Lane wrote: Michael Paquier writes: On Thu, Aug 17, 2017 at 6:01 AM, Tom Lane wrote: I'm not really qualified to review the Python coding style, but I did fix a typo in a comment. No pythonist here, but a

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-08-16 Thread Tom Lane
Michael Paquier writes: > On Thu, Aug 17, 2017 at 6:01 AM, Tom Lane wrote: >> I'm not really qualified to review the Python coding >> style, but I did fix a typo in a comment. > No pythonist here, but a large confusing "if" condition without any >

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-08-16 Thread Michael Paquier
On Thu, Aug 17, 2017 at 6:01 AM, Tom Lane wrote: > Pushed into v11. Thanks. > I'm not really qualified to review the Python coding > style, but I did fix a typo in a comment. No pythonist here, but a large confusing "if" condition without any comments is better if split up

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-08-16 Thread Tom Lane
Dang Minh Huong writes: > On 2017/07/05 15:28, Michael Paquier wrote: >> (Surprised to see that generate_unaccent_rules.py is inconsistent on >> MacOS, runs fine on Linux). FWIW, I got identical results from running the script on current macOS (Sierra) and Linux (RHEL6). >>

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-07-05 Thread Dang Minh Huong
On 2017/07/05 15:28, Michael Paquier wrote: I have finally been able to look at this patch. Thanks for reviewing and the new version of the patch. (Surprised to see that generate_unaccent_rules.py is inconsistent on MacOS, runs fine on Linux). def get_plain_letter(codepoint, table):

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-07-05 Thread Michael Paquier
On Wed, Jun 7, 2017 at 1:06 AM, Man Trieu wrote: > 2017-06-07 0:31 GMT+09:00 Bruce Momjian : >> >> On Wed, Jun 7, 2017 at 12:10:25AM +0900, Dang Minh Huong wrote: >> > > On Jun 4, 29 Heisei, at 00:48, Bruce Momjian wrote: >> >

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-06-06 Thread Bruce Momjian
On Tue, Jun 6, 2017 at 12:15:13PM -0400, Tom Lane wrote: > Bruce Momjian writes: > > There seems to be a problem. I can't see a patch dated 2017-06-07 on > > the commitfest page: > > https://commitfest.postgresql.org/14/1161/ > > It looks to me like the patch is buried

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-06-06 Thread Tom Lane
Bruce Momjian writes: > There seems to be a problem. I can't see a patch dated 2017-06-07 on > the commitfest page: > https://commitfest.postgresql.org/14/1161/ It looks to me like the patch is buried inside a multipart/alternative MIME section. That's evidently causing

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-06-06 Thread Bruce Momjian
On Wed, Jun 7, 2017 at 01:06:22AM +0900, Man Trieu wrote: > 2017-06-07 0:31 GMT+09:00 Bruce Momjian : > I added the thread but there was no change.  (I think the thread was > already present.)  It appears it is not seeing this patch as the latest > patch. > >

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-06-06 Thread Man Trieu
2017-06-07 0:31 GMT+09:00 Bruce Momjian : > On Wed, Jun 7, 2017 at 12:10:25AM +0900, Dang Minh Huong wrote: > > > On Jun 4, 29 Heisei, at 00:48, Bruce Momjian wrote: > > Shouldn't you use "or is_letter_with_marks()", instead of "or > len(...) > > >

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-06-06 Thread Bruce Momjian
On Wed, Jun 7, 2017 at 12:10:25AM +0900, Dang Minh Huong wrote: > > On Jun 4, 29 Heisei, at 00:48, Bruce Momjian wrote: > Shouldn't you use "or is_letter_with_marks()", instead of "or len(...) > > 1"? Your test might catch something that isn't based on a 'letter' >

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-06-06 Thread Dang Minh Huong
On Jun 4, 29 Heisei, at 00:48, Bruce Momjian wrote:On Sun, Jun  4, 2017 at 12:43:17AM +0900, Dang Minh Huong wrote:On May 30, 29 Heisei, at 00:22, Dang Minh Huong wrote:On May 29, 29 Heisei, at 10:47, Thomas Munro

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-06-03 Thread Michael Paquier
On Mon, May 29, 2017 at 10:47 AM, Thomas Munro wrote: >> [Quoting Michael] >>> Actually, with the recent work that has been done with >>> unicode_norm_table.h which has been to transpose UnicodeData.txt into >>> user-friendly tables, shouldn't the python script of

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-06-03 Thread Dang Minh Huong
On May 30, 29 Heisei, at 00:22, Dang Minh Huong wrote: unaccent.patch Description: Binary data On May 29, 29 Heisei, at 10:47, Thomas Munro wrote:On Sun, May 28, 2017 at 7:55 PM, Dang Minh Huong wrote:Thanks for reporting

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-05-29 Thread Dang Minh Huong
> On May 29, 29 Heisei, at 10:47, Thomas Munro > wrote: > > On Sun, May 28, 2017 at 7:55 PM, Dang Minh Huong wrote: >> Thanks for reporting and lecture about unicode. >> I attached a patch as the instruction from Thomas. Could you confirm

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-05-28 Thread Thomas Munro
On Sun, May 28, 2017 at 7:55 PM, Dang Minh Huong wrote: > [Quoting Thomas] >> You don't have to worry about decoding that line, it's all done in >> that Python script. The problem is just in the function >> is_letter_with_marks(). Instead of just checking if

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-05-28 Thread Dang Minh Huong
Hi, unaccent.patch Description: Binary data I am interested in this thread.On May 27, 29 Heisei, at 10:41, Michael Paquier wrote:On Fri, May 26, 2017 at 5:48 PM, Thomas Munro wrote:Unicode has two ways to represent characters with

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-05-27 Thread Kha Nguyen
Does this mean that the python script has to be updated to be recursive too? > On 27 May 2017, at 0.48, Thomas Munro wrote: > > On Sat, May 27, 2017 at 9:09 AM, Kha Nguyen wrote: >> Could you explain to me what this line means: >> “ >>

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-05-27 Thread Kha Nguyen
Could you explain to me what this line means: “ 1EA5;LATIN SMALL LETTER A WITH CIRCUMFLEX AND ACUTE;Ll;0;L;00E2 0301N;;;1EA4;;1EA4 “ If you could give me an example of adding a rule for “recursive” case, I can do the rest. I am not familiar with this unaccent format generation yet. Thanks

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-05-26 Thread Michael Paquier
On Fri, May 26, 2017 at 5:48 PM, Thomas Munro wrote: > Unicode has two ways to represent characters with accents: either with > composed codepoints like "é" or decomposed codepoints where you say > "e" and then "´". The field "00E2 0301" is the decomposed form of >

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-05-26 Thread Thomas Munro
On Sat, May 27, 2017 at 9:09 AM, Kha Nguyen wrote: > Could you explain to me what this line means: > “ > 1EA5;LATIN SMALL LETTER A WITH CIRCUMFLEX AND ACUTE;Ll;0;L;00E2 > 0301N;;;1EA4;;1EA4 > “ > > If you could give me an example of adding a rule for “recursive” case, I can

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-05-26 Thread Thomas Munro
On Sat, May 27, 2017 at 5:13 AM, Tom Lane wrote: > I wrote: >> Nguyen Le Hoang Kha writes: >>> Most of the time in Vietnamese language, there are up to 2 accents in a >>> character. These unaccent rules are added to handle such cases (which are >>> very

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-05-26 Thread Tom Lane
I wrote: > Nguyen Le Hoang Kha writes: >> Most of the time in Vietnamese language, there are up to 2 accents in a >> character. These unaccent rules are added to handle such cases (which are >> very common). > I can't see any reason not to add these --- any objections out

Re: [HACKERS] Extra Vietnamese unaccent rules

2017-05-26 Thread Tom Lane
Nguyen Le Hoang Kha writes: > Most of the time in Vietnamese language, there are up to 2 accents in a > character. These unaccent rules are added to handle such cases (which are > very common). I can't see any reason not to add these --- any objections out there?