Re: [HACKERS] Bug in UTF8-Validation Code?

Mark Dilger Sat, 31 Mar 2007 19:53:14 -0800

Mark Dilger wrote:

Bruce Momjian wrote:
Added to TODO:
* Fix cases where invalid byte encodings are accepted by thedatabase,
      but throw an error on SELECT
http://archives.postgresql.org/pgsql-hackers/2007-03/msg00767.php
Is anyone working on fixing this bug?
Hi, has anyone volunteered to fix this bug? I did not see any reply onthe mailing list to your question above.
mark

OK, I can take a stab at fixing this. I'd like to state some assumptions sopeople can comment and reply:

I assume that I need to fix *all* cases where invalid byte encodings get intothe database through functions shipped in the core distribution.

I assume I do not need to worry about people getting bad data into the systemthrough their own database extensions.

I assume that the COPY problem discussed up-thread goes away once you eliminateall the paths by which bad data can get into the system. However, existingdatabase installations with bad data already loaded will not be magically fixedwith these code patches.

Do any of the string functions (seehttp://www.postgresql.org/docs/8.2/interactive/functions-string.html) run therisk of generating invalid utf8 encoded strings? Do I need to add checks? Arethere known bugs with these functions in this regard?

If not, I assume I can add mbverify calls to the various input routines (textin,varcharin, etc) where invalid utf8 could otherwise enter the system.

I assume that this work can be limited to HEAD and that I don't need toback-patch it. (I suspect this assumption is a contentious one.)


Advice and comments are welcome,

mark

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Bug in UTF8-Validation Code?

Reply via email to