On Fri, Sep 23, 2011 at 8:51 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> On Thu, Sep 22, 2011 at 10:36 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> Anyway, I won't stand in the way of the patch as long as it's modified
>>> to limit the number of values considered for any one character position
>>> to something reasonably small.
>
>> I think that limit in both the old and new code is 1, except that the
>> new code does it more efficiently.
>
>> Am I confused?
>
> Yes, or else I am.  Consider a 4-byte UTF8 character at the end of the
> string.  The existing code increments the last byte up to 255 (rejecting
> everything past 0xBF), then gives up and truncates that character away.
> So the maximum number of tries for that character position is between 0
> and 127 depending on what the original character was (with at most 63 of
> the incremented values getting past the verifymbstr test).
>
> The proposed patch is going to iterate through all Unicode code points
> up to U+7FFFFF before giving up.  Since it's possible that we need to
> increment something further left to succeed at all, this doesn't seem
> like a good plan.

I think you're misreading the code.  It does this:

while (len > 0)
{
    boring stuff;
    if (charincfunc(lastchar, charlen))
    {
        more boring stuff;
        if (we made a greater string)
            return it;
        cleanup;
    }
   truncate away last character;
}

I don't see how that's ever going to try more than one character in
the same position.

What may be confusing you is that the old code has two loops: an outer
loop that tests whether we've made a greater string, and an inner loop
that tests whether we've made a validly encoded string at all.  In the
new code, at least in the UTF-8 case, the inner loop is GONE
altogether.  Instead of iterating until we construct a valid
character, we just use our mad UTF-8 skillz to assemble one, and
return it.

Or else I need to go drink a few cups of tea and look at this again.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to