Re: [HACKERS] invalid UTF-8 via pl/perl

2010-03-12 Thread Hannu Krosing
On Mon, 2010-03-08 at 21:52 -0500, Andrew Dunstan wrote: Tom Lane wrote: Hannu Krosing ha...@2ndquadrant.com writes: So SPI interface should also be fixed, either from perl side, or maybe from inside SPI ? SPI has every right to assume that data it's given is already in

Re: [HACKERS] invalid UTF-8 via pl/perl

2010-03-12 Thread Andrew Dunstan
Hannu Krosing wrote: On Mon, 2010-03-08 at 21:52 -0500, Andrew Dunstan wrote: Tom Lane wrote: Hannu Krosing ha...@2ndquadrant.com writes: So SPI interface should also be fixed, either from perl side, or maybe from inside SPI ? SPI has every right to assume

Re: [HACKERS] invalid UTF-8 via pl/perl

2010-03-08 Thread Hannu Krosing
On Sat, 2010-01-02 at 20:51 -0500, Andrew Dunstan wrote: Andrew Dunstan wrote: I think the plperl glue code should check returned strings using pg_verifymbstr(). Please test this patch. I think we'd probably want to trap the encoding error and issue a customised error message,

Re: [HACKERS] invalid UTF-8 via pl/perl

2010-03-08 Thread Tom Lane
Hannu Krosing ha...@2ndquadrant.com writes: So SPI interface should also be fixed, either from perl side, or maybe from inside SPI ? SPI has every right to assume that data it's given is already in the database encoding. regards, tom lane -- Sent via pgsql-hackers

Re: [HACKERS] invalid UTF-8 via pl/perl

2010-03-08 Thread Andrew Dunstan
Tom Lane wrote: Hannu Krosing ha...@2ndquadrant.com writes: So SPI interface should also be fixed, either from perl side, or maybe from inside SPI ? SPI has every right to assume that data it's given is already in the database encoding. Yeah, looks

Re: [HACKERS] invalid UTF-8 via pl/perl

2010-01-04 Thread Peter Eisentraut
On sön, 2010-01-03 at 18:40 -0500, Andrew Dunstan wrote: Incidentally, I guess we need to look at plpython and pltcl for similar issues. I confirm that the same issue exists in plpython. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your

Re: [HACKERS] invalid UTF-8 via pl/perl

2010-01-04 Thread Andrew Dunstan
Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: andrew=# select 'a' || invalid_utf_seq() || 'b'; ERROR: invalid byte sequence for encoding UTF8: 0xd0 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which

Re: [HACKERS] invalid UTF-8 via pl/perl

2010-01-04 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes: How about we just change the hint so it also refers to the possibility that the data comes from a PL? That would save lots of trouble. Maybe just lose the hint altogether. It's not adding that much, and I seem to recall that there have already been

Re: [HACKERS] invalid UTF-8 via pl/perl

2010-01-03 Thread Andrew Dunstan
Andrew Dunstan wrote: Andrew Dunstan wrote: I think the plperl glue code should check returned strings using pg_verifymbstr(). Please test this patch. I think we'd probably want to trap the encoding error and issue a customised error message, but this plugs all the holes I can see

Re: [HACKERS] invalid UTF-8 via pl/perl

2010-01-03 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes: One thing that I am pondering is: how does SPI handle things if the client encoding and server encoding are not the same? What? client_encoding is not used anywhere within the backend. Everything should be server_encoding.

Re: [HACKERS] invalid UTF-8 via pl/perl

2010-01-03 Thread Andrew Dunstan
Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: One thing that I am pondering is: how does SPI handle things if the client encoding and server encoding are not the same? What? client_encoding is not used anywhere within the backend. Everything should be server_encoding.

Re: [HACKERS] invalid UTF-8 via pl/perl

2010-01-03 Thread Andrew Dunstan
I wrote: I think the attached patch plugs the direct SPI holes as well. There are two issues with this patch. First, how far if at all should it be backpatched? All the way, or 8.3, where we tightened the encoding rules, or not at all? Second, It produces errors like this: andrew=#

Re: [HACKERS] invalid UTF-8 via pl/perl

2010-01-03 Thread David E. Wheeler
On Jan 3, 2010, at 11:54 AM, Andrew Dunstan wrote: There are two issues with this patch. First, how far if at all should it be backpatched? All the way, or 8.3, where we tightened the encoding rules, or not at all? 8.3 seems reasonable. Second, It produces errors like this: andrew=#

Re: [HACKERS] invalid UTF-8 via pl/perl

2010-01-03 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes: andrew=# select 'a' || invalid_utf_seq() || 'b'; ERROR: invalid byte sequence for encoding UTF8: 0xd0 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by

Re: [HACKERS] invalid UTF-8 via pl/perl

2010-01-03 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes: There are two issues with this patch. First, how far if at all should it be backpatched? All the way, or 8.3, where we tightened the encoding rules, or not at all? Forgot to mention --- I'm not in favor of backpatching. First because tightening

Re: [HACKERS] invalid UTF-8 via pl/perl

2010-01-03 Thread Andrew Dunstan
David E. Wheeler wrote: Second, It produces errors like this: andrew=# select 'a' || invalid_utf_seq() || 'b'; ERROR: invalid byte sequence for encoding UTF8: 0xd0 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is

Re: [HACKERS] invalid UTF-8 via pl/perl

2010-01-03 Thread Andrew Dunstan
Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: andrew=# select 'a' || invalid_utf_seq() || 'b'; ERROR: invalid byte sequence for encoding UTF8: 0xd0 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which

[HACKERS] invalid UTF-8 via pl/perl

2010-01-02 Thread Hannu Krosing
It is possible to get an invalid byte sequence into a text field via pl, in this case pl/perl : ---8--8--8--8--8--8--- CREATE TABLE utf_test ( id serial PRIMARY KEY, data character varying ); CREATE OR REPLACE FUNCTION invalid_utf_seq() RETURNS character varying AS

Re: [HACKERS] invalid UTF-8 via pl/perl

2010-01-02 Thread Andrew Dunstan
Hannu Krosing wrote: [plperl can return data that is not valid in the server encoding and it is not caught] This results in a table, which has invalid utf sequence in it and consequently does not pass dump/load What would be the best place to fix this ? Should there be checks in all text

Re: [HACKERS] invalid UTF-8 via pl/perl

2010-01-02 Thread Andrew Dunstan
Andrew Dunstan wrote: I think the plperl glue code should check returned strings using pg_verifymbstr(). Please test this patch. I think we'd probably want to trap the encoding error and issue a customised error message, but this plugs all the holes I can see with the possible