[HACKERS] invalidly encoded strings

Andrew Dunstan Sat, 08 Sep 2007 21:03:11 -0700

I have been looking at fixing the issue of accepting strings that arenot valid in the database encoding. It appears from previous discussionthat we need to add a call to pg_verifymbstr() to the relevant inputroutines and ensure that the chr() function returns a valid string. Thatleaves several issues:

. which are the relevant input routines? I have identified the followingas needing remediation: textin(), bpcharin(), varcharin(), anyenum_in(),namein(). Do we also need one for cstring_in()? Does the xml codehandle this as part of xml validation?

. what do we need to do to make the verification code more efficient? Ithink we need to address the correctness issue first, but doing soshould certainly make us want to improve the verification code. Forexample, I'm wondering if it might benefit from having a tiny cache.

. for chr() under UTF8, it seems to be generally agreed that theargument should represent the codepoint and the function should returnthe correspondingly encoded character. If so, possible the argumentshould be a bigint to accommodate the full range of possible codepoints. It is not clear what the argument should represent for othermulti-byte encodings for any argument higher than 127. Similarly, it isnot clear what ascii() should return in such cases. I would be inclinedjust to error out.


cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

[HACKERS] invalidly encoded strings

Reply via email to