Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-30 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Sat, Oct 29, 2011 at 4:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: Oh! You are right, I was expecting it to try multiple characters at the same position before truncating the string. This change seems to have lobotomized things rather thoroughly.

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-30 Thread Robert Haas
On Sun, Oct 30, 2011 at 10:58 AM, Tom Lane t...@sss.pgh.pa.us wrote: You are misreading the old code.  The inner loop increments the last byte, checks for success, and if it hasn't produced a greater string then it loops around to increment again. Ugh. You're right. -- Robert Haas

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-29 Thread Robert Haas
On Sat, Oct 29, 2011 at 1:16 PM, horiguchi.kyot...@oss.ntt.co.jp wrote: Hello, I feel at a loss what to do... I thought that code was looking for 0xED/0xF4 in the second position, but it's actually looking for them in the first position, which makes vastly more sense.  Whee! Anyway, I try

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: I've committed this, after a good deal of hacking on the comments, some coding style cleanup, and one bug fix: Ummm ... why do the incrementer functions think they need to restore the previous value on failure? AFAICS that's a waste of code and cycles,

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-29 Thread Robert Haas
On Sat, Oct 29, 2011 at 3:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I've committed this, after a good deal of hacking on the comments, some coding style cleanup, and one bug fix: Ummm ... why do the incrementer functions think they need to restore the

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Sat, Oct 29, 2011 at 3:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Ummm ... why do the incrementer functions think they need to restore the previous value on failure? AFAICS that's a waste of code and cycles, since there is only one caller and it

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-29 Thread Robert Haas
On Sat, Oct 29, 2011 at 4:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, it might not be strictly necessary for pg_utf8_increment() and pg_eucjp_increment(), but it's clearly necessary for the generic incrementer function for exactly the same reason it was needed in the old coding.  I suppose

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-22 Thread Robert Haas
On Thu, Oct 20, 2011 at 9:36 PM, Kyotaro HORIGUCHI horiguchi.kyot...@oss.ntt.co.jp wrote: This must be the basis of the behavior of pg_utf8_verifier(), and pg_utf8_increment() has taken over it. It may be good to describe this origin of the special handling as comment of these functions to

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-20 Thread Kyotaro HORIGUCHI
Hello, Robert Haas robertmh...@gmail.com writes: - Why does the second byte need special handling for 0xED and 0xF4? http://www.faqs.org/rfcs/rfc3629.html See section 4 in particular.  The underlying requirement is to disallow multiple representations of the same Unicode code point.

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-18 Thread Robert Haas
On Tue, Oct 18, 2011 at 12:11 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Oct 17, 2011 at 11:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: http://www.faqs.org/rfcs/rfc3629.html I'm still confused.  The input string is already known to be valid UTF-8,

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-17 Thread Robert Haas
On Wed, Oct 12, 2011 at 11:45 PM, Kyotaro HORIGUCHI horiguchi.kyot...@oss.ntt.co.jp wrote: Hello, the work is finished.  Version 4 of the patch is attached to this message. I went through this in a bit more detail tonight and am cleaning it up. But I'm a bit confused, looking at

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-17 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: - Why does the second byte need special handling for 0xED and 0xF4? http://www.faqs.org/rfcs/rfc3629.html See section 4 in particular. The underlying requirement is to disallow multiple representations of the same Unicode code point.

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-17 Thread Robert Haas
On Mon, Oct 17, 2011 at 11:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: - Why does the second byte need special handling for 0xED and 0xF4? http://www.faqs.org/rfcs/rfc3629.html See section 4 in particular.  The underlying requirement is to disallow

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-17 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Mon, Oct 17, 2011 at 11:54 PM, Tom Lane t...@sss.pgh.pa.us wrote: http://www.faqs.org/rfcs/rfc3629.html I'm still confused. The input string is already known to be valid UTF-8, so the second byte (if there is one) must be between 0x80 and 0xBF.

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-12 Thread Kyotaro HORIGUCHI
Hello, the work is finished. Version 4 of the patch is attached to this message. - Add rough description of the algorithm as comment to pg_utf8_increment() and pg_eucjp_increment(). - Fixed a bug of pg_utf8_increment() found while working. pg_(utf8|eucjp)_increment are retested on

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-11 Thread Kyotaro HORIGUCHI
At Fri, 7 Oct 2011 12:25:08 -0400, Robert Haas robertmh...@gmail.com wrote in ca+tgmoao2oozbmusfp3zc0_lgxsv3jbvy9eyr5h+czyez7j...@mail.gmail.com OK, I think this is reasonably close to being committable now. There are a few remaining style and grammar mistakes but I can fix those up before

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-07 Thread Robert Haas
On Fri, Oct 7, 2011 at 12:22 AM, Kyotaro HORIGUCHI horiguchi.kyot...@oss.ntt.co.jp wrote: Thank you for reviewing. The new version of this patch is attached to this message. OK, I think this is reasonably close to being committable now. There are a few remaining style and grammar mistakes but

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-06 Thread Kyotaro HORIGUCHI
Thank you for reviewing. The new version of this patch is attached to this message. But it seems to me that if the datatype is BYTEAOID then there's no need to restore anything at all, because we're not going to call pg_mbcliplen() in that case anyway.  So I think the logic here can be

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-03 Thread Robert Haas
On Thu, Sep 29, 2011 at 6:24 AM, Kyotaro HORIGUCHI horiguchi.kyot...@oss.ntt.co.jp wrote: This is new version of make_greater_string patch. According to the comments in the original source code, the purpose of savelastchar is to avoid confusing pg_mbcliplen(). You've preserved savelastchar only

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-10-03 Thread Robert Haas
On Mon, Oct 3, 2011 at 2:13 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Sep 29, 2011 at 6:24 AM, Kyotaro HORIGUCHI horiguchi.kyot...@oss.ntt.co.jp wrote: This is new version of make_greater_string patch. According to the comments in the original source code, the purpose of

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-29 Thread Kyotaro HORIGUCHI
This is new version of make_greater_string patch. 1. wchar.c:1532 pg_wchar_table: Restore the pg_wchar_table. 2. wchar.c:1371 pg_utf8_increment: Remove dangerous memcpy, but one memcpy is left because it's safe. Remove code check after increment. 3. wchar.c:1429 pg_eucjp_increment: Remove

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-26 Thread Peter Eisentraut
On fre, 2011-09-23 at 20:35 +0300, Marcin Mańk wrote: One idea: col like 'foo%' could be translated to col = 'foo' and col = foo || 'zzz' , where 'z' is the largest possible character. This should be good enough for calculating stats. How to find such a character, i do not know. That's

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-26 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes: On fre, 2011-09-23 at 20:35 +0300, Marcin Mańk wrote: One idea: col like 'foo%' could be translated to col = 'foo' and col = foo || 'zzz' , where 'z' is the largest possible character. This should be good enough for calculating stats. How to find

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-26 Thread Peter Eisentraut
On mån, 2011-09-26 at 10:08 -0400, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: On fre, 2011-09-23 at 20:35 +0300, Marcin Mańk wrote: One idea: col like 'foo%' could be translated to col = 'foo' and col = foo || 'zzz' , where 'z' is the largest possible character. This should

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-26 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes: On mån, 2011-09-26 at 10:08 -0400, Tom Lane wrote: No, it's a hundred times worse than that, because in collations other than C there typically *is* no total order. The collation behavior of many characters is context-sensitive, thanks to the

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-23 Thread Kyotaro HORIGUCHI
Hi, I think I have comprehended roughly around the constructs and the concept underlying. At Thu, 22 Sep 2011 12:35:56 -0400, Tom Lane t...@sss.pgh.pa.us wrote in 23159.1316709...@sss.pgh.pa.us tgl Sure, if the increment the top byte strategy proves to not accomplish tgl that effectively. But

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-23 Thread Robert Haas
On Thu, Sep 22, 2011 at 10:36 AM, Tom Lane t...@sss.pgh.pa.us wrote: Anyway, I won't stand in the way of the patch as long as it's modified to limit the number of values considered for any one character position to something reasonably small. I think that limit in both the old and new code is

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-23 Thread Robert Haas
On Fri, Sep 23, 2011 at 5:16 AM, Kyotaro HORIGUCHI horiguchi.kyot...@oss.ntt.co.jp wrote: Can I have another chance to show the another version of the patch according to the above? You can always post a new version of any patch. I think what you need to focus on is cleaning up the coding style

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-23 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Thu, Sep 22, 2011 at 10:36 AM, Tom Lane t...@sss.pgh.pa.us wrote: Anyway, I won't stand in the way of the patch as long as it's modified to limit the number of values considered for any one character position to something reasonably small. I think

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-23 Thread Robert Haas
On Fri, Sep 23, 2011 at 8:51 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Sep 22, 2011 at 10:36 AM, Tom Lane t...@sss.pgh.pa.us wrote: Anyway, I won't stand in the way of the patch as long as it's modified to limit the number of values considered for

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-23 Thread Marcin Mańk
One idea: col like 'foo%' could be translated to col = 'foo' and col = foo || 'zzz' , where 'z' is the largest possible character. This should be good enough for calculating stats. How to find such a character, i do not know. -- Sent via pgsql-hackers mailing list

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-22 Thread Kyotaro HORIGUCHI
Thank you for your understanding on that point. At Wed, 21 Sep 2011 20:35:02 -0400, Robert Haas robertmh...@gmail.com wrote ...while Kyotaro Horiguchi clearly feels otherwise, citing the statistic that about 100 out of 7000 Japanese characters fail to work properly:

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-22 Thread Robert Haas
On Thu, Sep 22, 2011 at 12:24 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I'm a bit perplexed as to why we can't find a non-stochastic way of doing this. [ collations suck ] Ugh. Now, having said that, I'm starting to wonder again why it's worth our

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-22 Thread Greg Stark
On Thu, Sep 22, 2011 at 1:49 PM, Robert Haas robertmh...@gmail.com wrote: My thought was that it would avoid the need to do any character incrementing at all.  You could just start scanning forward as if the operator were = and then stop when you hit the first string that doesn't have the same

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-22 Thread Robert Haas
On Thu, Sep 22, 2011 at 8:59 AM, Greg Stark st...@mit.edu wrote: On Thu, Sep 22, 2011 at 1:49 PM, Robert Haas robertmh...@gmail.com wrote: My thought was that it would avoid the need to do any character incrementing at all.  You could just start scanning forward as if the operator were = and

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Thu, Sep 22, 2011 at 8:59 AM, Greg Stark st...@mit.edu wrote: But the whole problem is that not all the strings with the initial substring are in a contiguous block. If that were true for the sorts of indexes we're using for LIKE queries, the

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-22 Thread Greg Stark
On Thu, Sep 22, 2011 at 2:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: The essential problem here is when can you stop scanning, given a pattern with this prefix?, and btree doesn't know any more about that than make_greater_string does; it would in fact have to use make_greater_string or

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Thu, Sep 22, 2011 at 12:24 AM, Tom Lane t...@sss.pgh.pa.us wrote: Now, having said that, I'm starting to wonder again why it's worth our trouble to fool with encoding-specific incrementers.  The exactness of the estimates seems unlikely to be

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-22 Thread Tom Lane
Greg Stark st...@mit.edu writes: On Thu, Sep 22, 2011 at 2:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: The essential problem here is when can you stop scanning, given a pattern with this prefix?, and btree doesn't know any more about that than make_greater_string does; it would in fact have to

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-22 Thread Robert Haas
On Thu, Sep 22, 2011 at 10:36 AM, Tom Lane t...@sss.pgh.pa.us wrote: Anyway, I won't stand in the way of the patch as long as it's modified to limit the number of values considered for any one character position to something reasonably small. One thing I was thinking about is that it would be

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: One thing I was thinking about is that it would be useful to have some metric for judging how well any given algorithm that we might pick here actually works. Well, the metric that we were indirectly using earlier was the number of characters in a

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-22 Thread Robert Haas
On Thu, Sep 22, 2011 at 11:46 AM, Tom Lane t...@sss.pgh.pa.us wrote: Well, the metric that we were indirectly using earlier was the number of characters in a given locale for which the algorithm fails to find a greater one (excluding whichever character is last, I guess, or you could just

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Thu, Sep 22, 2011 at 11:46 AM, Tom Lane t...@sss.pgh.pa.us wrote: Well, the metric that we were indirectly using earlier was the number of characters in a given locale for which the algorithm fails to find a greater one (excluding whichever

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-21 Thread Robert Haas
On Tue, Sep 13, 2011 at 10:13 PM, Kyotaro HORIGUCHI horiguchi.kyot...@oss.ntt.co.jp wrote: This is rebased patch of `Allow encoding specific character incrementer'(https://commitfest.postgresql.org/action/patch_view?id=602). Addition to the patch, increment sanity check program for new

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: As I understand it, the issue here is that when we try to generate suitable and quals for a LIKE expression, we need to find a string which is greater than the prefix we're searching for, and the existing algorithm sometimes fails. But there seems

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-21 Thread Robert Haas
On Wed, Sep 21, 2011 at 9:49 PM, Tom Lane t...@sss.pgh.pa.us wrote: The main risk that I foresee with the proposed approach is that if you have, say, a 4-byte final character, you could iterate through a *whole lot* (millions) of larger encoded characters, with absolutely no hope of making a

Re: [HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-21 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: I'm a bit perplexed as to why we can't find a non-stochastic way of doing this. Because the behavior of common collation algorithms is so wacky as to approach stochasticity. In particular, as soon as your string contains a mix of letter and non-letter

[HACKERS] [v9.2] make_greater_string() does not return a string in some cases

2011-09-13 Thread Kyotaro HORIGUCHI
This is rebased patch of `Allow encoding specific character incrementer'(https://commitfest.postgresql.org/action/patch_view?id=602). Addition to the patch, increment sanity check program for new functions pg_generic_charinc and pg_utf8_increment is attached. -- Kyotaro Horiguchi NTT Open