Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-10 Thread Tatsuo Ishii
Tatsuo Ishii is...@postgresql.org writes: So far as I can see, the only LCPRVn marker code that is actually in use right now is 0x9d --- there are no instances of 9a, 9b, or 9c that I can find. I also read in the xemacs internals doc, at

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-10 Thread Tom Lane
Tatsuo Ishii is...@postgresql.org writes: Done along with comment that we follow emacs's implementation, not xemacs's. Well, when the preceding comment block contains five references to xemacs and the link for more information leads to www.xemacs.org, I don't think it's real helpful to add one

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-10 Thread Tatsuo Ishii
Well, when the preceding comment block contains five references to xemacs and the link for more information leads to www.xemacs.org, I don't think it's real helpful to add one sentence saying oh by the way we're not actually following xemacs. I continue to think that we'd be better off to

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-08 Thread Tatsuo Ishii
Tatsuo Ishii is...@postgresql.org writes: So far as I can see, the only LCPRVn marker code that is actually in use right now is 0x9d --- there are no instances of 9a, 9b, or 9c that I can find. I also read in the xemacs internals doc, at

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-07 Thread Tatsuo Ishii
Tatsuo Ishii is...@postgresql.org writes: So far as I can see, the only LCPRVn marker code that is actually in use right now is 0x9d --- there are no instances of 9a, 9b, or 9c that I can find. I also read in the xemacs internals doc, at

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-05 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Sun, Jul 1, 2012 at 5:11 AM, Alexander Korotkov aekorot...@gmail.com wrote: [ new patch ] With the improved comments in pg_wchar.h, it seemed clear what needed to be done here, so I fixed up the MULE conversion and committed this. I'd

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-05 Thread Tom Lane
Tatsuo Ishii is...@postgresql.org writes: So far as I can see, the only LCPRVn marker code that is actually in use right now is 0x9d --- there are no instances of 9a, 9b, or 9c that I can find. I also read in the xemacs internals doc, at

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-05 Thread Robert Haas
On Thu, Jul 5, 2012 at 7:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sun, Jul 1, 2012 at 5:11 AM, Alexander Korotkov aekorot...@gmail.com wrote: [ new patch ] With the improved comments in pg_wchar.h, it seemed clear what needed to be done here,

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-05 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Thu, Jul 5, 2012 at 7:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Hm, several of these routines seem to neglect to advance the from pointer? Err... yeah. That's not a bug I introduced, but I should have caught it... and it does make me wonder how

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-05 Thread Tatsuo Ishii
Tatsuo Ishii is...@postgresql.org writes: So far as I can see, the only LCPRVn marker code that is actually in use right now is 0x9d --- there are no instances of 9a, 9b, or 9c that I can find. I also read in the xemacs internals doc, at

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-05 Thread Robert Haas
On Thu, Jul 5, 2012 at 8:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Jul 5, 2012 at 7:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Hm, several of these routines seem to neglect to advance the from pointer? Err... yeah. That's not a bug I

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-04 Thread Robert Haas
On Sun, Jul 1, 2012 at 5:11 AM, Alexander Korotkov aekorot...@gmail.com wrote: [ new patch ] With the improved comments in pg_wchar.h, it seemed clear what needed to be done here, so I fixed up the MULE conversion and committed this. I'd appreciate it if someone would check my work, but I think

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-04 Thread Tatsuo Ishii
On Sun, Jul 1, 2012 at 5:11 AM, Alexander Korotkov aekorot...@gmail.com wrote: [ new patch ] With the improved comments in pg_wchar.h, it seemed clear what needed to be done here, so I fixed up the MULE conversion and committed this. I'd appreciate it if someone would check my work, but

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-03 Thread Tatsuo Ishii
OK. So, in that case, I suggest that if the leading byte is non-zero, we emit 0x9d followed by the three available bytes, instead of first testing whether the first byte is = 0xf0. That test seems to serve no purpose but to confuse the issue. Probably the code shoud look like this(see below

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-03 Thread Alexander Korotkov
On Tue, Jul 3, 2012 at 10:17 AM, Tatsuo Ishii is...@postgresql.org wrote: OK. So, in that case, I suggest that if the leading byte is non-zero, we emit 0x9d followed by the three available bytes, instead of first testing whether the first byte is = 0xf0. That test seems to serve no

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-03 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes: It's likely we also need to assign some names to all these numbers (0xf0, 0xf4, 0xfe, 0x9c, 0x9d). But it's hard for me to invent such names. The encoding ID byte values already have names (see pg_wchar.h), but the private prefix bytes don't. I

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-03 Thread Tatsuo Ishii
I have added comments about mule internal encoding by refreshing my memory and from old document found on web(http://mibai.tec.u-ryukyu.ac.jp/cgi-bin/info2www?%28mule%29Buffer%20and%20string). Any objection to apply my patch? -- Tatsuo Ishii SRA OSS, Inc. Japan English:

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-03 Thread Tom Lane
Tatsuo Ishii is...@postgresql.org writes: I have added comments about mule internal encoding by refreshing my memory and from old document found on web(http://mibai.tec.u-ryukyu.ac.jp/cgi-bin/info2www?%28mule%29Buffer%20and%20string). Any objection to apply my patch? It needs a bit of

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-03 Thread Tom Lane
I wrote: Tatsuo Ishii is...@postgresql.org writes: I have added comments about mule internal encoding by refreshing my memory and from old document found on web(http://mibai.tec.u-ryukyu.ac.jp/cgi-bin/info2www?%28mule%29Buffer%20and%20string). Any objection to apply my patch? It needs a

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-03 Thread Tatsuo Ishii
So far as I can see, the only LCPRVn marker code that is actually in use right now is 0x9d --- there are no instances of 9a, 9b, or 9c that I can find. I also read in the xemacs internals doc, at http://www.xemacs.org/Documentation/21.5/html/internals_26.html#SEC145 that XEmacs thinks the

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Robert Haas
On Sun, Jul 1, 2012 at 5:11 AM, Alexander Korotkov aekorot...@gmail.com wrote: MULE also looks problematic. The code that you've written isn't symmetric with the opposite conversion, unlike what you did in all other cases, and I don't understand why. I'm also somewhat baffled by the reverse

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Alexander Korotkov
On Mon, Jul 2, 2012 at 8:12 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Jul 1, 2012 at 5:11 AM, Alexander Korotkov aekorot...@gmail.com wrote: MULE also looks problematic. The code that you've written isn't symmetric with the opposite conversion, unlike what you did in all other

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Robert Haas
On Mon, Jul 2, 2012 at 4:33 PM, Alexander Korotkov aekorot...@gmail.com wrote: On Mon, Jul 2, 2012 at 8:12 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Jul 1, 2012 at 5:11 AM, Alexander Korotkov aekorot...@gmail.com wrote: MULE also looks problematic. The code that you've written

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Alexander Korotkov
On Tue, Jul 3, 2012 at 12:37 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jul 2, 2012 at 4:33 PM, Alexander Korotkov aekorot...@gmail.com wrote: On Mon, Jul 2, 2012 at 8:12 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Jul 1, 2012 at 5:11 AM, Alexander Korotkov

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Robert Haas
On Mon, Jul 2, 2012 at 4:46 PM, Alexander Korotkov aekorot...@gmail.com wrote: So, I provided such transformation in versions 0.3 and 0.4 based on explanation from Tatsuo Ishii. The problem is that both conversions are nontrivial and it's not evident that they are mirror (understanding that

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Tatsuo Ishii
Yeah, I did. I think I may be a bit confused here, so let me try to understand this a bit better. It seems like pg_mule2wchar_with_len uses the following algorithm: - If the first character IS_LC1 (0x81-0x8d), decode two bytes, stored with shifts of 16 and 0. - If the first character

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: In the reverse transformation implemented by pg_wchar2mule_with_len, if the byte stored with shift 16 IS_LC1 or IS_LC2, then we decode 2 or 3 bytes, respectively, exactly as I would expect. ASCII decoding is also as I would expect. The case I don't

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Tom Lane
I wrote: Some inspection of pg_wchar.h suggests that the IS_LCPRV1 and IS_LCPRV2 cases are unused: the file doesn't define any encoding labels that match the byte values they accept, nor do the comments suggest that Emacs has any such labels either. Scratch that --- I was misled by the fond

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-02 Thread Robert Haas
On Mon, Jul 2, 2012 at 7:33 PM, Tatsuo Ishii is...@postgresql.org wrote: Yeah, I did. I think I may be a bit confused here, so let me try to understand this a bit better. It seems like pg_mule2wchar_with_len uses the following algorithm: - If the first character IS_LC1 (0x81-0x8d), decode

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-07-01 Thread Alexander Korotkov
On Wed, Jun 27, 2012 at 11:35 PM, Robert Haas robertmh...@gmail.com wrote: It looks to me like pg_wchar2utf_with_len will not work, because unicode_to_utf8 returns its second argument unmodified - not, as your code seems to assume, the byte following what was already written. Fixed. MULE

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-06-27 Thread Robert Haas
On Thu, May 24, 2012 at 12:04 AM, Alexander Korotkov aekorot...@gmail.com wrote: Thanks. I rewrote inverse conversion from pg_wchar to mule. New version of patch is attached. Review: It looks to me like pg_wchar2utf_with_len will not work, because unicode_to_utf8 returns its second argument

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-28 Thread Tatsuo Ishii
On Tue, May 22, 2012 at 3:27 PM, Tatsuo Ishii is...@postgresql.org wrote: Thanks for your comments. They clarify a lot. But I still don't realize how can we distinguish IS_LCPRV2 and IS_LC2? Isn't it possible for them to produce same pg_wchar? If LB is in 0x90 - 0x99 range, then they

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-23 Thread Alexander Korotkov
On Tue, May 22, 2012 at 3:27 PM, Tatsuo Ishii is...@postgresql.org wrote: Thanks for your comments. They clarify a lot. But I still don't realize how can we distinguish IS_LCPRV2 and IS_LC2? Isn't it possible for them to produce same pg_wchar? If LB is in 0x90 - 0x99 range, then they are

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-22 Thread Tatsuo Ishii
Hi Alexander, It was good seeing you in Ottawa! Hello, Ishii-san! We've talked on PGCon that I've questions about mule to wchar conversion. My questions about pg_mule2wchar_with_len function are following. In these parts of code: * * else if (IS_LCPRV1(*from) len = 3) { from++;

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-22 Thread Alexander Korotkov
On Tue, May 22, 2012 at 11:50 AM, Tatsuo Ishii is...@postgresql.org wrote: I think it's possible. The first characters are defined like this: #define IS_LCPRV1(c)((unsigned char)(c) == 0x9a || (unsigned char)(c) == 0x9b) #define IS_LCPRV2(c)((unsigned char)(c) == 0x9c || (unsigned

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-22 Thread Tatsuo Ishii
Thanks for your comments. They clarify a lot. But I still don't realize how can we distinguish IS_LCPRV2 and IS_LC2? Isn't it possible for them to produce same pg_wchar? If LB is in 0x90 - 0x99 range, then they are LC2. If LB is in 0xf0 - 0xff range, then they are LCPRV2. -- Tatsuo Ishii SRA

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-21 Thread Alexander Korotkov
Hello, Ishii-san! We've talked on PGCon that I've questions about mule to wchar conversion. My questions about pg_mule2wchar_with_len function are following. In these parts of code: * * else if (IS_LCPRV1(*from) len = 3) { from++; *to = *from++ 16; *to |= *from++; len -= 3; }

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-02 Thread Robert Haas
On Tue, May 1, 2012 at 6:02 PM, Alexander Korotkov aekorot...@gmail.com wrote: Right. When number of trigrams is big, it is slow to scan posting list of all of them. The solution is this case is to exclude most frequent trigrams from index scan. But, it require some kind of statistics of

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-02 Thread Alexander Korotkov
On Wed, May 2, 2012 at 4:50 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, May 1, 2012 at 6:02 PM, Alexander Korotkov aekorot...@gmail.com wrote: Right. When number of trigrams is big, it is slow to scan posting list of all of them. The solution is this case is to exclude most

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-02 Thread Robert Haas
On Wed, May 2, 2012 at 9:35 AM, Alexander Korotkov aekorot...@gmail.com wrote: I was thinking you could perhaps do it just based on the *number* of trigrams, not necessarily their frequency. Imagine we've two queries: 1) SELECT * FROM tbl WHERE col LIKE '%abcd%'; 2) SELECT * FROM tbl WHERE

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-02 Thread Alexander Korotkov
On Wed, May 2, 2012 at 5:48 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, May 2, 2012 at 9:35 AM, Alexander Korotkov aekorot...@gmail.com wrote: Imagine we've two queries: 1) SELECT * FROM tbl WHERE col LIKE '%abcd%'; 2) SELECT * FROM tbl WHERE col LIKE '%abcdefghijk%'; The

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-01 Thread Alexander Korotkov
Hi Erik On Sun, Apr 29, 2012 at 4:12 PM, Erik Rijkers e...@xs4all.nl wrote: Perhaps I'm too early with these tests, but FWIW I reran my earlier test program against three instances. (the patches compiled fine, and make check was without problem). -- 3 instances: HEAD

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-01 Thread Alexander Korotkov
On Mon, Apr 30, 2012 at 10:07 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Apr 29, 2012 at 8:12 AM, Erik Rijkers e...@xs4all.nl wrote: Perhaps I'm too early with these tests, but FWIW I reran my earlier test program against three instances. (the patches compiled fine, and make

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-05-01 Thread Alexander Korotkov
On Tue, May 1, 2012 at 1:48 AM, Kevin Grittner kevin.gritt...@wicourts.govwrote: My biggest complaint is related to setting the threshold for the % operator. It seems to me that there should be a GUC to control the default, and that there should be a way to set the threshold for each %

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-04-30 Thread Robert Haas
On Sun, Apr 29, 2012 at 8:12 AM, Erik Rijkers e...@xs4all.nl wrote: Perhaps I'm too early with these tests, but FWIW I reran my earlier test program against three instances.  (the patches compiled fine, and make check was without problem). These tests results seem to be more about the pg_trgm

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-04-30 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote: Hopefully that's not too hard to fix; the basic approach seems quite promising. After playing with trigram searches for name searches against copies of production database with appropriate indexing, our shop has chosen it as the new way to do name

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

2012-04-29 Thread Erik Rijkers
Hi Alexander, Perhaps I'm too early with these tests, but FWIW I reran my earlier test program against three instances. (the patches compiled fine, and make check was without problem). -- 3 instances: HEAD port 6542 trgm_regex port 6547 HEAD + trgm-regexp patch (22