Tom Lane wrote: > Heikki Linnakangas <[EMAIL PROTECTED]> writes: >> Tom Lane wrote: >>> I can reproduce an out-of-memory condition (basically, replace() is >>> going into an infinite loop because of the invalid input) but I'm >>> not seeing any crash. > >> replace_text reads past the end of source string, byte by byte (or >> character by character, not sure), and eventually tries to read from an >> invalid address which causes a segfault. It happens here when start_posn >> == 367368. > > Hm, must be memory-layout-dependent. On mine, the output string buffer > is growing fast enough to ensure there's still RAM to read, up till the > kernel says no more. > > Anyway the problem is that pg_utf2wchar_with_len silently drops the > trailing incomplete character in its input, causing text_position_next > to think the pattern is empty, causing an infinite loop because > curr_posn never advances. replace_text already tried to guard against > empty pattern, but it doesn't know about this case. > > What I intend to do to fix this is to modify the users of > text_position_next to believe the string lengths saved by > text_position_setup, rather than using TEXTLEN() to compute > the lengths. This will effectively make replace_text and > friends consistently act as though the partial character isn't there. > > In the long run it might be better to make pg_utf2wchar_with_len throw > an error for bad input, but I'm quite unsure of the consequences of > that, in view of the existing comment "not ours to throw error". > Anyway such a potentially-significant behavioral change doesn't seem > like a good idea to back-patch. (We seem to have this bug in one form > or another clear back to 7.3...)
I agree we should do the above changes for the sake of robustness, but isn't the real problem here that chr function can return invalid byte sequences? That was actually discussed a while back (starting at http://archives.postgresql.org/pgsql-hackers/2007-04/msg00010.php), but that was inconclusive. IMHO chr should at the very least not return invalid byte sequences. Limiting it to ascii range is not a bad idea either, though that might break some applications. Is there any other known loopholes to get invalid data in the database? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match