Tom Lane wrote:
> Heikki Linnakangas <[EMAIL PROTECTED]> writes:
>> Tom Lane wrote:
>>> I can reproduce an out-of-memory condition (basically, replace() is
>>> going into an infinite loop because of the invalid input) but I'm
>>> not seeing any crash.
> 
>> replace_text reads past the end of source string, byte by byte (or
>> character by character, not sure), and eventually tries to read from an
>> invalid address which causes a segfault. It happens here when start_posn
>> == 367368.
> 
> Hm, must be memory-layout-dependent.  On mine, the output string buffer
> is growing fast enough to ensure there's still RAM to read, up till the
> kernel says no more.
> 
> Anyway the problem is that pg_utf2wchar_with_len silently drops the
> trailing incomplete character in its input, causing text_position_next
> to think the pattern is empty, causing an infinite loop because
> curr_posn never advances.  replace_text already tried to guard against
> empty pattern, but it doesn't know about this case.
> 
> What I intend to do to fix this is to modify the users of
> text_position_next to believe the string lengths saved by
> text_position_setup, rather than using TEXTLEN() to compute
> the lengths.  This will effectively make replace_text and
> friends consistently act as though the partial character isn't there.
> 
> In the long run it might be better to make pg_utf2wchar_with_len throw
> an error for bad input, but I'm quite unsure of the consequences of
> that, in view of the existing comment "not ours to throw error".
> Anyway such a potentially-significant behavioral change doesn't seem
> like a good idea to back-patch.  (We seem to have this bug in one form
> or another clear back to 7.3...)

I agree we should do the above changes for the sake of robustness, but
isn't the real problem here that chr function can return invalid byte
sequences? That was actually discussed a while back (starting at
http://archives.postgresql.org/pgsql-hackers/2007-04/msg00010.php), but
that was inconclusive.

IMHO chr should at the very least not return invalid byte sequences.
Limiting it to ascii range is not a bad idea either, though that might
break some applications.

Is there any other known loopholes to get invalid data in the database?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Reply via email to